Welcome to the BEHAV3D Tumor Profiler Demo. Here you can find a demo
example of how the Behaved Tumor Profiler pipeline works and what each
of the outputs can provide.
Tu run the pipeline for yourself, please refer to the Github
page, where you can find a wider explanation of the pipeline, as
well as the Google
Colab and R script versions.
Firstly all the necessary libraries must be installed
library(dplyr)
library(dtwclust)
library(e1071)
library(emmeans)
library(ggplot2)
library(gtools)
library(parallel)
library(pheatmap)
library(plotly)
library(plyr)
library(RANN)
library(readr)
library(reshape2)
library(scales)
library(sp)
library(spatstat)
library(stats)
library(tidyr)
library(umap)
library(viridis)
library(zoo)
We set a seed for reproducibility:
set.seed(123)
Input data can be obtained from image analysis software like Imaris (Oxford Instruments), where you extract the relevant features for each cell types. Each cell type must be inputted separately into the pipeline.
Input data must follow the following format:
Mouse_CellType_timelapse_LargeScaleRegion_Feature.csv
| Our analyzed Cell types | |||
|---|---|---|---|
| Tumor Cells | SR101 | MG (Microglia) | BV (Blood Vessel) |
Example:
2430F13_SR101_timelapse_CL3_2_Distance.csv
Mouse information includes: 2430 -> Mother |
F -> Sex | 13 -> Day
LargeScaleRegion information includes: CL2 -> Class
(Environmental Cluster) | 2430F13.CL2_2 or just
CL2_2 -> Position
These are the features to be uploaded:
DisplacementDistance_tumorDisplacement_Delta_LengthDisplacement_LengthPositionSpeedTime
The data is outputed by softwares like Imaris in a .csv
format.
For example, a Tumor cells 2430F11 CL1_1 displacement file would look like this (Each row represents a single cell track):
And a SR101 2430F16 CL1_1 Position file would look like this:
There are two ways to start or continue your analysis in BEHAV3D Tumor Profiler:
We need to import the dataset and classifying the information into
distinct dataframes
Create a reading function, that reads input from Imaris software:
## function to import the stats of the tracked tumor cells data from csv files extracted by
## Imaris
read_plus <- function(flnm) {
read_csv(flnm, skip = 3, col_types = cols()) %>%
mutate(filename = flnm)
}
Now we can import the datafiles of interest. We are basically extracting
the information from .csv files and creating the
corresponding dataframes
## set directory where csv files are located
library(plyr)
working_directory <- "D:/BEHAV3D_Tumor_Profiler/data/Tracked_tumor_cells"
setwd(working_directory)
# import volume per organoid object
pat = "*Displacement"
files <- list.files(path = working_directory, pattern = pat, full.names = T, recursive = TRUE)
files <- files[seq(1, length(files), 3)]
disp_csv <- ldply(files, read_plus)
# import distance to tumor
pat = "*Distance_tumor"
files <- list.files(path = working_directory, pattern = pat, full.names = T, recursive = TRUE)
dist_tumor_csv <- ldply(files, read_plus)
# import area per organoid object
pat = "*Displacement_Delta_Length"
files <- list.files(path = working_directory, pattern = pat, full.names = T, recursive = TRUE)
dipl_delta_csv <- ldply(files, read_plus)
# import area per organoid object
pat = "*Displacement_Length"
files <- list.files(path = working_directory, pattern = pat, full.names = T, recursive = TRUE)
dipl_length_csv <- ldply(files, read_plus)
# import position per organoid object
pat = "*Position"
files <- list.files(path = working_directory, pattern = pat, full.names = T, recursive = TRUE)
pos_csv <- ldply(files, read_plus)
# import speed
pat = "*Speed"
files <- list.files(path = working_directory, pattern = pat, full.names = T, recursive = TRUE)
speed_csv <- ldply(files, read_plus)
# import Time
pat = "*Time"
files <- list.files(path = working_directory, pattern = pat, full.names = T, recursive = TRUE)
time_csv <- ldply(files, read_plus)
### ## make each ID unique (IDs imported from different imaging files can be repeated):
category <- as.factor(dist_tumor_csv$filename)
ranks <- rank(-table(category), ties.method = "first")
ranks <- as.data.frame(ranks)
ranks$filename <- row.names(ranks)
dist_tumor_csv <- left_join(dist_tumor_csv, ranks)
dist_tumor_csv$ID2 <- with(dist_tumor_csv, interaction(ID, ranks))
category <- as.factor(pos_csv$filename)
ranks <- rank(-table(category), ties.method = "first")
ranks <- as.data.frame(ranks)
ranks$filename <- row.names(ranks)
pos_csv <- left_join(pos_csv, ranks)
pos_csv$ID2 <- with(pos_csv, interaction(ID, ranks))
category <- as.factor(time_csv$filename)
ranks <- rank(-table(category), ties.method = "first")
ranks <- as.data.frame(ranks)
ranks$filename <- row.names(ranks)
time_csv <- left_join(time_csv, ranks)
time_csv$ID2 <- with(time_csv, interaction(ID, ranks))
category <- as.factor(speed_csv$filename)
ranks <- rank(-table(category), ties.method = "first")
ranks <- as.data.frame(ranks)
ranks$filename <- row.names(ranks)
speed_csv <- left_join(speed_csv, ranks)
speed_csv$ID2 <- with(speed_csv, interaction(ID, ranks))
category <- as.factor(dipl_length_csv$filename) ##this measures how much distnace did a cell move over, a cell can mover over a big distance and stay at the same position
ranks <- rank(-table(category), ties.method = "first")
ranks <- as.data.frame(ranks)
ranks$filename <- row.names(ranks)
dipl_length_csv <- left_join(dipl_length_csv, ranks)
dipl_length_csv$ID2 <- with(dipl_length_csv, interaction(ID, ranks))
category <- as.factor(dipl_delta_csv$filename) ## this variable measures how much distance did the cell move from its origin
ranks <- rank(-table(category), ties.method = "first")
ranks <- as.data.frame(ranks)
ranks$filename <- row.names(ranks)
dipl_delta_csv <- left_join(dipl_delta_csv, ranks)
dipl_delta_csv$ID2 <- with(dipl_delta_csv, interaction(ID, ranks))
category <- as.factor(disp_csv$filename) ## a bit similar to the previous ones, this measures the area that a cell covers overtime
ranks <- rank(-table(category), ties.method = "first")
ranks <- as.data.frame(ranks)
ranks$filename <- row.names(ranks)
disp_csv <- left_join(disp_csv, ranks)
disp_csv$ID2 <- with(disp_csv, interaction(ID, ranks))
Firstly, we create the dataframe containing the statistics of
interest
detach(package:plyr)
## create dataframe containing the statistics of interest
master_1 <- left_join(dist_tumor_csv[, c(1, 13)], time_csv[, c(1, 5, 6, 8, 10, 11)], by = c(ID2 = "ID2"))
master_1 <- left_join(master_1, disp_csv[, c(1, 11)], by = c(ID2 = "ID2"))
master_1 <- left_join(master_1, dipl_delta_csv[, c(1, 11)], by = c(ID2 = "ID2"))
master_1 <- left_join(master_1, dipl_length_csv[, c(1, 11)], by = c(ID2 = "ID2"))
master_1 <- left_join(master_1, speed_csv[, c(1, 11)], by = c(ID2 = "ID2"))
master <- left_join(master_1, pos_csv[, c(1, 2, 3, 14)], by = c(ID2 = "ID2"))
colnames(master) <- c("dist_tumor", "ID2", "Time", "Tracked", "TrackID", "filename", "ranks", "disp2",
"disp_d", "disp_l", "speed", "x", "y", "z")
# Extract mouse data (ID) using regexp (mother+sex+day)
master$mouse <- master$filename
master$mouse <- gsub("^.*[_/](\\d*[MF]\\d+).*", "\\1", master$mouse, perl = TRUE)
# Extract position metadata
master$position <- master$filename
master$position <- gsub(".*_(CL\\d+_\\d+)_.*", "\\1", master$position, perl = TRUE)
master$position <- with(master, interaction(mouse, position))
# Extract class metadata
master$class <- master$position
master$class <- gsub(".*(CL\\d+)_.*", "\\1", master$class, perl = TRUE)
master$filename <- NULL
## Round the time
master$Time <- round(master$Time, digits = 2)
## check that the imported tracks look ok
g <- ggplot(master) + geom_path(aes(x, y, group = TrackID), size = 0.2, arrow = arrow(ends = "last",
type = "open", length = unit(0.01, "inches"))) + theme_bw() + theme(aspect.ratio = 1) + facet_wrap(class ~
position, scales = "free")
g
, the Blood Vessel information is extracted:
### Import BLOOD VESSELS stats (does not exist for all tracked positions so needs to be
### processed separtely.)
working_directory2 <- "D:/BEHAV3D_Tumor_Profiler/data/BV_stats"
setwd(working_directory2)
files <- list.files(path = working_directory2, full.names = T, recursive = TRUE)
BV_df <- ldply(files, read_plus)
### Process blood vessels data:
BV_df <- BV_df[c(1, 6, 7, 8, 9, 11)]
colnames(BV_df) <- c("dist_BV", "Time", "Tracked", "TrackID", "ID", "filename")
## rename mouse accroding to filename:
BV_df$mouse <- BV_df$filename
BV_df$mouse <- gsub("^.*[_/](\\d*[MF]\\d+).*", "\\1", BV_df$mouse, perl = TRUE)
BV_df$position <- BV_df$filename
BV_df$position <- gsub(".*_(CL\\d+_\\d+)_.*", "\\1", BV_df$position, perl = TRUE)
BV_df$position <- with(BV_df, interaction(mouse, position))
BV_df$class <- BV_df$position
BV_df$class <- gsub(".*(CL\\d+)_.*", "\\1", BV_df$class, perl = TRUE)
## create ranks column based on master ranks:
ranks_master <- subset(master, position %in% BV_df$position)
ranks_master <- ranks_master[!duplicated(ranks_master$position), c(6, 15)]
colnames(ranks_master)[1] <- "ranks_bv"
BV_df <- left_join(BV_df, ranks_master)
## keep only tracked cells and with a TrackID:
BV_df <- subset(BV_df, Tracked == "Tracked")
## remove any NA:
BV_df <- na.omit(BV_df)
## create unique Track_ID for Blood vessels dataframe:
BV_df$Track2 <- with(BV_df, interaction(ranks_bv, TrackID))
BV_df$Track2 <- gsub(".", "", BV_df$Track2, fixed = T)
BV_df$Track2 <- as.numeric(as.character(BV_df$Track2))
## sumarize per TrackID the average, min and max distance to blood vessels
ggplot(BV_df, aes(dist_BV)) + geom_histogram() + facet_wrap(. ~ position)
BV_df <- na.omit(BV_df)
BV_df$BV_contact <- ifelse(BV_df$dist_BV < 4, 0, 1)
BV_df_sum <- BV_df %>%
group_by(Track2) %>%
summarise(BV_sd = sd(dist_BV)/sqrt(length(dist_BV)), BV_mean = mean(dist_BV), BV_min = min(dist_BV),
BV_max = max(dist_BV), BV_contact = mean(BV_contact))
library(plyr)
## import Sr101 cells for the populations that have them
working_directory <- "D:/BEHAV3D_Tumor_Profiler/data/SR101_objects"
setwd(working_directory)
# import volume per organoid object
# import position per organoid object
pat = "*Position"
files <- list.files(path = working_directory, pattern = pat, full.names = T, recursive = TRUE)
pos_csv <- ldply(files, read_plus)
# import Time
pat = "*Time"
files <- list.files(path = working_directory, pattern = pat, full.names = T, recursive = TRUE)
time_csv <- ldply(files, read_plus)
### ## make each TrackID unique (TrackIDs imported from different imaging files can be
### repeated):
category <- as.factor(pos_csv$filename)
ranks <- rank(-table(category), ties.method = "first")
ranks <- as.data.frame(ranks)
ranks$filename <- row.names(ranks)
pos_csv <- left_join(pos_csv, ranks)
pos_csv$ID2 <- with(pos_csv, interaction(ID, ranks))
category <- as.factor(time_csv$filename)
ranks <- rank(-table(category), ties.method = "first")
ranks <- as.data.frame(ranks)
ranks$filename <- row.names(ranks)
time_csv <- left_join(time_csv, ranks)
time_csv$ID2 <- with(time_csv, interaction(ID, ranks))
detach(package:plyr)
## create dataframe containing the statistics of interest
master_SR101 <- left_join(pos_csv[, c(1, 2, 3, 10, 12)], time_csv[, c(1, 9)], by = c(ID2 = "ID2"))
colnames(master_SR101) <- c("x", "y", "z", "filename", "ID2", "Time")
## rename mouse accroding to filename: Extract mouse data (ID) using regexp (mother+sex+day)
master_SR101$mouse <- master_SR101$filename
master_SR101$mouse <- gsub("^.*[_/](\\d*[MF]\\d+).*", "\\1", master_SR101$mouse, perl = TRUE)
# Extract position metadata
master_SR101$position <- master_SR101$filename
master_SR101$position <- gsub(".*_(CL\\d+_\\d+)_.*", "\\1", master_SR101$position, perl = TRUE)
master_SR101$position <- with(master_SR101, interaction(mouse, position))
# Extract class metadata
master_SR101$class <- master_SR101$position
master_SR101$class <- gsub(".*(CL\\d+)_.*", "\\1", master_SR101$class, perl = TRUE)
master_SR101$filename <- NULL
## Convert imported time to hours:
master_SR101$Time <- round(master_SR101$Time, digits = 2)
## import MG cells for the populations that have them
library(plyr)
working_directory <- "D:/BEHAV3D_Tumor_Profiler/data/MG_objects"
setwd(working_directory)
# import position per organoid object
pat = "*Position"
files <- list.files(path = working_directory, pattern = pat, full.names = T, recursive = TRUE)
pos_csv <- ldply(files, read_plus)
# import Time
pat = "*Time"
files <- list.files(path = working_directory, pattern = pat, full.names = T, recursive = TRUE)
time_csv <- ldply(files, read_plus)
### ## make each TrackID unique (TrackIDs imported from different imaging files can be
### repeated):
category <- as.factor(pos_csv$filename)
ranks <- rank(-table(category), ties.method = "first")
ranks <- as.data.frame(ranks)
ranks$filename <- row.names(ranks)
pos_csv <- left_join(pos_csv, ranks)
pos_csv$ID2 <- with(pos_csv, interaction(ID, ranks))
category <- as.factor(time_csv$filename)
ranks <- rank(-table(category), ties.method = "first")
ranks <- as.data.frame(ranks)
ranks$filename <- row.names(ranks)
time_csv <- left_join(time_csv, ranks)
time_csv$ID2 <- with(time_csv, interaction(ID, ranks))
detach(package:plyr)
## create dataframe containing the statistics of interest
master_MG <- left_join(pos_csv[, c(1, 2, 3, 10, 12)], time_csv[, c(1, 9)], by = c(ID2 = "ID2"))
colnames(master_MG) <- c("x", "y", "z", "filename", "ID2", "Time")
## rename mouse accroding to filename: Extract mouse data (ID) using regexp (mother+sex+day)
master_MG$mouse <- master_MG$filename
master_MG$mouse <- gsub("^.*[_/](\\d*[MF]\\d+).*", "\\1", master_MG$mouse, perl = TRUE)
# Extract position metadata
master_MG$position <- master_MG$filename
master_MG$position <- gsub(".*_(CL\\d+_\\d+)_.*", "\\1", master_MG$position, perl = TRUE)
master_MG$position <- with(master_MG, interaction(mouse, position))
# Extract class metadata
master_MG$class <- master_MG$position
master_MG$class <- gsub(".*(CL\\d+)_.*", "\\1", master_MG$class, perl = TRUE)
master_MG$filename <- NULL
## Convert imported time to hours:
master_MG$Time <- round(master_MG$Time, digits = 2)
Now we need to join the information from all of the sources into a
single dataframe
We first calulate the interactions between the nearest cells:
### test that tracks imported are fine:
master_test <- master %>%
group_by(position) %>%
mutate(x = x - min(x), y = y - min(y))
g <- ggplot(master) + geom_path(aes(x, y, group = TrackID, colour = dist_tumor), arrow = arrow(ends = "last",
type = "closed", length = unit(0.005, "inches"))) + theme_bw()
g <- g + theme(legend.position = "none") + facet_wrap(. ~ position, scales = "free")
g
## To calculate interactions between cells
List2 = list()
for (m in unique(master$position)) {
distance_M4 <- master[which(master$position == m), ]
List = list()
for (i in unique(distance_M4$Time)) {
distancei <- distance_M4[distance_M4$Time == i, ]
distancei2 <- as.data.frame(distancei[, c(3, 11, 12, 13)])
coordin <- ppx(distancei2, coord.type = c("t", "s", "s", "s"))
dist1 <- nndist(coordin)
dist2 <- nndist(coordin, k = 2)
dist3 <- nndist(coordin, k = 3)
dist4 <- nndist(coordin, k = 4)
dist5 <- nndist(coordin, k = 5)
dist6 <- nndist(coordin, k = 6)
dist7 <- nndist(coordin, k = 7)
dist8 <- nndist(coordin, k = 8)
dist9 <- nndist(coordin, k = 9)
dist10 <- nndist(coordin, k = 10)
dist <- cbind(dist1, dist2, dist3, dist4, dist5, dist6, dist7, dist8, dist9, dist10) ### calculate distance to the three nearest neighbors
dist <- data.frame(rowMeans(dist[, c(1:3)]), rowMeans(dist))
distanceDFi_dist <- cbind(distancei, dist)
List[[length(List) + 1]] <- distanceDFi_dist ## store to list object
}
master_dist <- do.call(rbind, List) ## convert List to dta
List2[[length(List2) + 1]] <- master_dist
}
master_distance <- do.call(rbind, List2)
colnames(master_distance)[c(17, 18)] <- c("dist_3_neigh", "dist_10_neigh")
### remove the untracked tracks:
master_distance <- subset(master_distance, Tracked == "Tracked")
## create a unique track_ID for master:
master_distance$Track2 <- with(master_distance, interaction(ranks, TrackID))
master_distance$Track2 <- gsub(".", "", master_distance$Track2, fixed = T)
master_distance$Track2 <- as.numeric(as.character(master_distance$Track2))
## check that the imported tracks look ok
g <- ggplot(master_distance) + geom_path(aes(x, y, group = Track2), size = 0.2, arrow = arrow(ends = "last",
type = "open", length = unit(0.01, "inches"))) + theme_bw() + theme(aspect.ratio = 1) + facet_wrap(class ~
position, scales = "free")
g
## For SR101
master_distance_SR101 <- subset(master_distance, mouse %in% unique(master_SR101$mouse))
### Calculate distance from Tumor cells to SR101. Considering that olig2 doens;t move much,
### let's project the position of the birghtest timepoint from each movie to all timepoints.
### Identify birghtest timepoint by the number of SR101 objects:
brightest_SR101 <- master_SR101 %>%
group_by(Time, position) %>%
summarise(n = n()) %>%
group_by(position) %>%
filter(n == max(n))
master_SR101 <- subset(master_SR101, Time %in% brightest_SR101$Time)
List2 <- list()
for (m in unique(master_distance_SR101$position)) {
master_pos <- master_distance_SR101 %>%
filter(position == m) %>%
group_by(Time) %>%
arrange(Time)
master_SR101_pos <- master_SR101 %>%
filter(position == m) %>%
group_by(Time) %>%
arrange(Time)
List1 = list() ## create list for ID of the contacting objects
for (t in unique(master_pos$Time)) {
master_pos_t <- master_pos %>%
filter(Time == t)
### calculate distance to 5 neigrest neighorbing SR101 in a radius of 40um
cells_radius <- nn2(data = master_SR101_pos[c("x", "y", "z")], query = master_pos_t[c("x",
"y", "z")], k = 30, treetype = "bd", searchtype = "radius", radius = 30)
cells_min <- nn2(data = master_SR101_pos[c("x", "y", "z")], query = master_pos_t[c("x",
"y", "z")], k = 1, treetype = "bd", searchtype = "standard")
dist_table = data.frame(master_pos_t[c("Time", "Track2", "mouse", "position", "class")],
cells_min[["nn.dists"]], cells_radius[["nn.dists"]])
List1[[length(List1) + 1]] <- dist_table
rm(temp)
}
cell_radius_t <- do.call(rbind, List1)
List2[[length(List2) + 1]] <- cell_radius_t
rm(cell_radius_t)
}
master_dist_SR101 <- do.call(rbind, List2)
### convert the distant cells to 0
temp2 <- master_dist_SR101[c("X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8", "X9", "X10", "X11",
"X12", "X13", "X14", "X15", "X16", "X17", "X18", "X19", "X20", "X21", "X22", "X23", "X24", "X25",
"X26", "X27", "X28", "X29", "X30")]
temp2[temp2 > 1.340781e+153] <- 0
### Calculate number of contacts:
master_dist_SR101$n_SR101 <- rowSums(temp2 != 0)
### renamen min_SR101:
colnames(master_dist_SR101)[colnames(master_dist_SR101) == "cells_min...nn.dists..."] <- "min_SR101"
### bind information on SR101 distnaces to the main dataframe:
master_dist_SR101[c("X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8", "X9", "X10", "X11", "X12", "X13",
"X14", "X15", "X16", "X17", "X18", "X19", "X20", "X21", "X22", "X23", "X24", "X25", "X26", "X27",
"X28", "X29", "X30")] <- NULL
## For each Track2 find what is the minimal distance to SR101 and the max n of SR101
## neiighbors in 30um diameter:
master_distance_SR101 <- master_dist_SR101 %>%
group_by(Track2, mouse, position, class) %>%
summarize(n_SR101 = max(n_SR101), min_SR101 = min(min_SR101))
### plot per position types the values of
p <- ggplot(master_distance_SR101) + geom_jitter(aes(x = class, y = n_SR101)) + geom_bar(aes(x = class,
y = n_SR101), alpha = 0.5, stat = "summary") + ggtitle("number of neighboring SR101 cells per position type") +
theme_classic() + theme(aspect.ratio = 0.7)
p
p <- ggplot(master_distance_SR101) + geom_jitter(aes(x = class, y = min_SR101)) + geom_boxplot(aes(x = class,
y = min_SR101), alpha = 0.5, outlier.colour = NA) + ggtitle("min distance to SR101 cells per position type") +
theme_classic() + theme(aspect.ratio = 0.7)
p
## For MG
master_distance_MG <- subset(master_distance, mouse %in% unique(master_MG$mouse))
### Calculate distance from Tumor cells to MG. Considering that olig2 doens;t move much,
### let's project the position of the birghtest timepoint from each movie to all timepoints.
### Identify birghtest timepoint by the number of MG objects:
brightest_MG <- master_MG %>%
group_by(Time, position) %>%
summarise(n = n()) %>%
group_by(position) %>%
filter(n == max(n))
master_MG <- subset(master_MG, Time %in% brightest_MG$Time)
List2 <- list()
for (m in unique(master_distance_MG$position)) {
master_pos <- master_distance_MG %>%
filter(position == m) %>%
group_by(Time) %>%
arrange(Time)
master_MG_pos <- master_MG %>%
filter(position == m) %>%
group_by(Time) %>%
arrange(Time)
List1 = list() ## create list for ID of the contacting objects
for (t in unique(master_pos$Time)) {
master_pos_t <- master_pos %>%
filter(Time == t)
### calculate distance to 5 neigrest neighorbing MG in a radius of 30um
cells_radius <- nn2(data = master_MG_pos[c("x", "y", "z")], query = master_pos_t[c("x",
"y", "z")], k = 30, treetype = "bd", searchtype = "radius", radius = 30)
cells_min <- nn2(data = master_MG_pos[c("x", "y", "z")], query = master_pos_t[c("x", "y",
"z")], k = 1, treetype = "bd", searchtype = "standard")
dist_table = data.frame(master_pos_t[c("Time", "Track2", "mouse", "position", "class")],
cells_min[["nn.dists"]], cells_radius[["nn.dists"]])
List1[[length(List1) + 1]] <- dist_table
rm(temp)
}
cell_radius_t <- do.call(rbind, List1)
List2[[length(List2) + 1]] <- cell_radius_t
rm(cell_radius_t)
}
master_dist_MG <- do.call(rbind, List2)
### convert the distant cells to 0
temp2 <- master_dist_MG[c("X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8", "X9", "X10", "X11", "X12",
"X13", "X14", "X15", "X16", "X17", "X18", "X19", "X20", "X21", "X22", "X23", "X24", "X25", "X26",
"X27", "X28", "X29", "X30")]
temp2[temp2 > 1.340781e+153] <- 0
### Calculate number of contacts:
master_dist_MG$n_MG <- rowSums(temp2 != 0)
### renamen min_MG:
colnames(master_dist_MG)[colnames(master_dist_MG) == "cells_min...nn.dists..."] <- "min_MG"
### bind information on MG distnaces to the main dataframe:
master_dist_MG[c("X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8", "X9", "X10", "X11", "X12", "X13",
"X14", "X15", "X16", "X17", "X18", "X19", "X20", "X21", "X22", "X23", "X24", "X25", "X26", "X27",
"X28", "X29", "X30")] <- NULL
## For each Track2 find what is the minimal distance to MG and the max n of MG neiighbors in
## 30um diameter:
master_distance_MG <- master_dist_MG %>%
group_by(Track2, mouse, position, class) %>%
summarize(n_MG = max(n_MG), min_MG = min(min_MG))
### plot per position types the values of
p <- ggplot(master_distance_MG) + geom_jitter(aes(x = class, y = n_MG)) + geom_bar(aes(x = class,
y = n_MG), alpha = 0.5, stat = "summary") + ggtitle("number of neighboring MG cells per position type") +
theme_classic() + theme(aspect.ratio = 0.7)
p
p <- ggplot(master_distance_MG) + geom_jitter(aes(x = class, y = min_MG)) + geom_boxplot(aes(x = class,
y = min_MG), alpha = 0.5, outlier.colour = NA) + ggtitle("min distance to MG cells per position type") +
theme_classic() + theme(aspect.ratio = 0.7)
p
Time of interest selection and substitution of NA values:
### create empyt dataframe with combination of Time of interest:
max(master_distance$Time)
## [1] 5.44
length(seq(from = 0, to = 5.6, by = 0.2))
## [1] 29
empty_df = master_distance[1:29, ] ### create a dataframe with the same size as the sequence of Time of interest:
empty_df[!is.na(empty_df)] <- NA ## make it empyt
empty_df$Time <- seq(from = 0, to = 5.6, by = 0.2)
empty_df$Track2 <- 0.5
### now merge this fake dataframe with my dataframe of interest:
master_distance <- rbind(master_distance, empty_df)
### complete all the datasets so that they have the same timepoints:
master_distance <- master_distance %>%
complete(Track2, Time)
### remove the fake track that is not necessary anymore:
master_distance <- subset(master_distance, !Track2 == 0.5)
## round the number, otherwise they are not equal
master_distance$Time <- round(master_distance$Time, 2)
## if there is any track2 and time that was duplicated remove the NA Assuming your dataframe
## is named your_dataframe
master_distance <- master_distance %>%
group_by(Time, Track2) %>%
dplyr::slice(if (n() == 1) 1 else which(!is.na(dist_tumor)))
### create a for loop to refill all the NA with interpolated values: create a list of the
### columns names that need to be refilled
column_names <- names(master_distance)
column_names <- subset(column_names, !column_names %in% c("TrackID", "Track2", "ranks", "ID", "ID2",
"Time", "position", "mouse", "class", "pos", "Tracked", "speed"))
## create a first dataset with refilled values for speed:
time_series <- acast(master_distance, Time ~ Track2, value.var = "speed", fun.aggregate = mean)
## rownames timepoints:
row.names(time_series) <- unique(master_distance$Time)
## get rid of NA
time_series_zoo <- zoo(time_series, row.names(time_series))
time_series_zoo <- na.approx(time_series_zoo) ## replace by interpolated value
time_series <- as.matrix(time_series_zoo)
time_series2 <- melt(time_series)
data <- time_series2[complete.cases(time_series2), ]
colnames(data) <- c("Time", "Track2", "speed")
### ----------
for (i in column_names) {
time_series <- acast(master_distance, Time ~ Track2, value.var = i, fun.aggregate = mean)
row.names(time_series) <- unique(master_distance$Time)
## get rid of NA
time_series_zoo <- zoo(time_series, row.names(time_series))
time_series_zoo <- na.approx(time_series_zoo) ## replace by last value
time_series <- as.matrix(time_series_zoo)
time_series2 <- melt(time_series)
new <- time_series2[complete.cases(time_series2), ]
data[, ncol(data) + 1] <- new[3] # Append new column
colnames(data)[ncol(data)] <- paste0(i)
}
### check that the data is correct by plotting the evolution of a certain variable overtime
### -----
p1 <- ggplot(data, aes(Time, disp2, group = Track2, color = as.factor(Track2))) + geom_smooth(method = "loess",
size = 0.5, se = F, alpha = 0.3, span = 1) + theme_bw() + theme(axis.text.x = element_text(size = 10),
axis.text.y = element_text(size = 5), axis.title.y = element_text(size = 10), axis.title.x = element_text(size = 15),
legend.text = element_text(size = 10)) + ggtitle("disp2") + theme(aspect.ratio = 1, legend.position = "none")
p1
Then, metadata information is added to the analysis:
### add metadata info
master_cor <- data
master_metadata <- master_distance[c("position", "mouse", "class", "Track2")]
master_metadata <- na.omit(master_metadata)
master_metadata <- master_metadata[!duplicated(master_metadata$Track2), ]
# detach(package:plyr, unload = TRUE)
master_cor <- left_join(master_cor, master_metadata)
### Filter timepoints of interest with same time interval:
time_of_interest <- as.character(seq(from = 0, to = 5.6, by = 0.2)) ### convert to character
master_cor$Time <- as.character(master_cor$Time)
master_cor <- subset(master_cor, Time %in% time_of_interest)
master_cor$Time <- as.numeric(master_cor$Time) ### now back to numeric
## check that the imported tracks look ok
g <- ggplot(master_cor) + geom_path(aes(x, y, group = Track2), size = 0.2, arrow = arrow(ends = "last",
type = "open", length = unit(0.01, "inches"))) + theme_bw() + theme(aspect.ratio = 1) + facet_wrap(class ~
position, scales = "free")
g
Finally, the generated dataframes are exported to use as
Checkpoints when repeating the analysis:
Set working directory to export dataframes
setwd("D:/BEHAV3D_Tumor_Profiler/results/rds")
All imported information:
saveRDS(master_distance, "master_distance.rds")
Time of interest subset information:
saveRDS(master_cor, "master_cor.rds")
Microglia information:
saveRDS(master_distance_MG, "master_distance_MG.rds")
SR101 information:
saveRDS(master_distance_SR101, "master_distance_SR101.rds")
Blood Vessel information:
saveRDS(BV_df_sum, "BV_df_sum.rds")
.rds files obtained.If the RDS files are in the working directory, the BEHAV3D Tumor
Profiler script will skip the data input and initial pre-processing, by
using the provided .rds files.
setwd("D:/BEHAV3D_Tumor_Profiler/results")
master_cor <- readRDS("master_cor.rds")
master_distance <- readRDS("master_distance.rds")
master_distance_MG <- readRDS("master_distance_MG.rds")
master_distance_SR101 <- readRDS("master_distance_SR101.rds")
BV_df_sum <- readRDS("BV_df_sum.rds")
print("data already imported")
## [1] "data already imported"
After the .rds checkpoints, we can continue with joint
data processing.
In this section, the joint dataframes are adjusted so all track have the same length
### sumarize the length of the tracks: create relative time
master_cor2 <- master_cor %>%
group_by(Track2) %>%
arrange(Time) %>%
mutate(Time2 = Time - dplyr::first(Time))
ggplot(master_cor2, aes(Time2)) + geom_histogram() + facet_wrap(. ~ position)
This plot shows a summary of the imported tracks for each position and
time
max_time <- master_cor2 %>%
group_by(position) %>%
summarise(max_Time = max(Time2))
min_time <- min(max_time$max_Time) #this is the minimal time that any position has
hist(max_time$max_Time)
Here we can see the track length distribution, so we can normalize all tracks to the minimal common time:
master_cor2 <- master_cor2 %>%
group_by(Track2) %>%
filter(Time2 < min_time) ##Minimal time for a position is 2.6
## keep only the tracks that are of the same length (max number of timepoints (13 in this
## case), this is how they were defined) Calculate the most frequent n for each group
subtrack_length <- master_cor2 %>%
group_by(Track2, position) %>%
summarise(n = n_distinct(Time2), first_t = min(Time2), last_t = max(Time2)) %>%
ungroup()
hist(subtrack_length$n)
This histogram shows the track length across all tracks, so the most
common track length can be selected
freq_n <- as.numeric(which.max(table(subtrack_length$n))) # Most frequent n
subtrack_length <- subtrack_length %>%
filter(n == freq_n) # we need to automate this to get the most frequent n
# Now we only keep these subtracks that have the same length:
master_cor2 <- subset(master_cor2, Track2 %in% subtrack_length$Track2)
Plot the unique tracks for each position using the tumor distance as
color label:
### create a parameter for distance to tumor core
master_cor2 <- master_cor2 %>%
group_by(position) %>%
mutate(dist_tumor_1 = scales::rescale(dist_tumor, to = c(0, 100)))
### normalize the distance to tumor factor per position position
master_cor2 <- master_cor2 %>%
group_by(Track2) %>%
arrange(Time2) %>%
mutate(dist_tumor = dist_tumor - dplyr::first(dist_tumor)) %>%
ungroup()
#### check tracks
master_M4_test <- master_cor2 %>%
group_by(position, mouse) %>%
mutate(x = x - min(x), y = y - min(y))
g <- ggplot(master_M4_test) + geom_path(aes(x, y, group = Track2, colour = dist_tumor), arrow = arrow(ends = "last",
type = "closed", length = unit(0.005, "inches"))) + theme_bw()
g <- g + theme(legend.position = "none") + scale_colour_gradient2(low = "blue", mid = "grey", high = "red") +
facet_wrap(. ~ position)
g
Number of unqiue tracks that are processed:
length(unique(master_cor2$Track2))
## [1] 981
Now we need to do some feature engineering, since we make a sliding
window analysis we need to requantify the values of displacement, speed,
etc from scratch
## scaling
library(parallel)
library(dplyr)
library(dtwclust)
library(stats)
library(scales)
# detach(package:spatstat, unload = TRUE) normalize the data:
master_norm <- master_cor2 %>%
ungroup()
column_names2 <- names(master_cor2)
## select what data to normalize
column_names2 <- subset(column_names2, !column_names2 %in% c("Time", "Track2", "x", "y", "z", "position",
"mouse", "class", "Time2", "dist_tumor_1", "dist_3_neigh", "dist_10_neigh", "iteration", "Track2"))
master_norm <- as.data.frame(master_norm)
# transform skwed variables Calculate skewness for each variable
skew_values <- apply(master_norm[c(column_names2)], 2, skewness)
# Identify variables with skewness greater than a threshold (e.g., 0.5)
skewed_variables <- names(skew_values[abs(skew_values) > 2])
# Apply logarithmic transformation to skewed variables
master_norm[, skewed_variables] <- log1p(master_norm[, skewed_variables])
We performa a PCA dimensionality reduction to prepare for the
multivariate analysis
### perfrom PCA on the factors I want to use
master_scaled <- master_norm[c(column_names2)] %>%
mutate(across(everything(), scale)) %>%
ungroup() ##first scale
## now to give more weight to the varibale of dist_tumor so the ability of cells to leave
## master_scaled[,c('dist_tumor')]<-master_scaled[,c('dist_tumor')]*2
master_pca <- prcomp(master_norm[c(column_names2)], scale = T)
master_pca2 <- master_pca[["x"]] ##select only PC
## Semi-supervised learning, we introduce as well the distance to tumor variable that defines
## the direction of the cells, since it is the most important for us Make an option of running
## completely unsupervised (only include the pca) and semi-supervised learning here
master_pca3 <- data.frame(master_norm[, c(2)], master_pca2[, c(1:3)]) ### take only first 3 components since they explain most variation
Choose between an unsupervised or a semi-supervised approach. The
semi-supervised approach introduces the distance to tumor variable in
the analysis
Analysis <- "Semi-supervised" # @param ['Semi-supervised', 'Unsupervised']
if (Analysis == "Semi-supervised") {
colnames(master_pca3)[1] <- "Track2"
print("Semi-supervised Analysis")
} else {
print("Unsupervised Analysis")
}
## [1] "Semi-supervised Analysis"
Plot the PCA information:
pcaCharts <- function(x) {
x.var <- x$sdev^2
x.pvar <- x.var/sum(x.var)
print("proportions of variance:")
print(x.pvar)
par(mfrow = c(2, 2))
plot(x.pvar, xlab = "Principal component", ylab = "Proportion of variance explained", ylim = c(0,
1), type = "b")
plot(cumsum(x.pvar), xlab = "Principal component", ylab = "Cumulative Proportion of variance explained",
ylim = c(0, 1), type = "b")
screeplot(x, main = "Absolute Variance", xlab = "Principal Component")
screeplot(x, type = "l", main = "Absolute Variance")
par(mfrow = c(1, 1))
}
pcaCharts(master_pca)
## [1] "proportions of variance:"
## [1] 0.5643365414 0.1929346874 0.1787666118 0.0631326010 0.0008295584
Once we have normalized, adjusted and prepared all the data for analysis, it is time to proceed to the analysis modules.
The BEHAV3D Tumor Profiler is divided in three distinct modules:
1. Heterogeneity Module - Implements multiparametric single-cell time-series classification, allowing us to identify distinct single-cell behavioral patterns
Optional modules:
2. Large-scale phenotyping module - Performs large-scale TME phenotyping and identifies regions with a specific cellular composition and architecture within the TME of intravitally imaged tumors
3. Small-scale phenotyping module - Further refines TME phenotyping to better understand tumor cell behavior
The optional modules are already including the necessary sections for module 1 to allow for an easy run.
Modules are collapsed to facilitate navigation. Please unscroll the modules you would like to check
Dynamic Time Warping (DTW) is an algorithm used to measure similarity between two time series by aligning them in a way that minimizes the distance between corresponding points. It is particularly useful for comparing time series that may vary in speed or timing, like on this case.
If the matrix_distmat.rds file exists, it will skip the
DTW analysis time, as it has been run previously. Else, it will run
normally and generate the file at the end.
### MULTIVARIATE
list_master_pca <- split(master_pca3[, -c(1)], master_pca3$Track2) ## split into list
if (!file.exists("matrix_distmat.rds")) {
## parallel working load parallel
library(parallel)
# create multi-process workers
workers <- makeCluster(detectCores() - 2)
# load dtwclust in each one, and make them use 1 thread per worker
invisible(clusterEvalQ(workers, {
library(dtwclust)
RcppParallel::setThreadOptions(1L)
}))
# register your workers, e.g. with doParallel
require(doParallel)
registerDoParallel(workers)
### MULTIVARIATE
set.seed(123)
distmat <- proxy::dist(list_master_pca, method = "dtw")
saveRDS(distmat, file = "distmat.rds")
matrix_distmat <- as.matrix(distmat)
saveRDS(matrix_distmat, file = "matrix_distmat.rds")
} else {
matrix_distmat <- readRDS("matrix_distmat.rds")
}
Track2 <- as.numeric(names(list_master_pca))
Now, we can perform the UMAP representation and cluster in the desired number of clusters. Any of the parameters can be modified to your convenience
# Adjust parameters according to the experiment characteristics
umap_dist <- umap(matrix_distmat, n_components = 2, input = "dist", init = "random", n_neighbors = 7,
min_dist = 0.5, spread = 6, n_epochs = 1000, local_connectivity = 1)
umap_1 <- as.data.frame(umap_dist$layout)
# scatter3Drgl(umap_1$V1, umap_1$V2, umap_1$V3)
Plot of the raw UMAP
plot(umap_dist$`layout`, # Plot the result
col=rgb(0,0,0,alpha=0.1),
pch=19,
asp=0.4, xlab="UMAP 1", ylab="UMAP 2",
main="Raw UMAP")
Here, we determine the number of clusters for the analysis.
(n_cluster)
umap_1 <- as.data.frame(umap_dist$layout)
Track2_umap <- cbind(Track2, umap_1)
positiontype <- master_norm[, c("Track2", "position", "class", "mouse")]
positiontype <- positiontype[!duplicated(positiontype$Track2), ]
umap_2 <- left_join(Track2_umap, positiontype)
# Clustering
n_cluster = 7
hc <- hclust(dist(umap_dist$layout, method = "euclidean"), method = "ward.D2")
hierarchical_clusters <- cutree(hc, k = n_cluster)
umap_3 <- cbind(hierarchical_clusters, umap_2)
colnames(umap_3)[1] <- "cluster2"
Here. we can obtain different UMAP representations, providing our color
palette
## plot
# Make a color palette that adjusts to the cluster direction Generate a continuous palette
# with 100 colors
continuous_palette <- colorRampPalette(c("cyan4", "darkturquoise", "darkorchid4", "darkorchid1",
"deeppink4", "deeppink1", "goldenrod3", "gold"))(50)
# Select 8 colors from the continuous palette
mypalette_1 <- continuous_palette[seq(1, length(continuous_palette), length.out = n_cluster)]
ggplot(umap_3, aes(x = V1, y = V2, color = as.factor(cluster2))) + geom_point(size = 2, alpha = 0.8) +
labs(color = "cluster") + xlab("") + ylab("") + ggtitle("umap Cluster") + scale_color_manual(values = mypalette_1) +
theme_light(base_size = 20) + theme(axis.text.x = element_blank(), axis.text.y = element_blank()) +
theme(aspect.ratio = 1)
UMAP positions colored by large-scale regions
ggplot(umap_3, aes(x = V1, y = V2, color = as.factor(class))) + geom_point(size = 0.5, alpha = 0.8) +
labs(color = "Environment_cluster") + xlab("") + ylab("") + ggtitle("umap Cluster") + scale_color_manual(values = c("cyan",
"gold", "red")) + theme_light(base_size = 20) + theme(axis.text.x = element_blank(), axis.text.y = element_blank()) +
theme(aspect.ratio = 1) + facet_wrap(mouse ~ position)
UMAP colored by large-scale regions
ggplot(umap_3, aes(x = V1, y = V2, color = as.factor(class))) + geom_point(size = 2, alpha = 0.6) +
labs(color = "Tumor region phenotype") + xlab("") + ylab("") + ggtitle("distribution of large scale regions") +
scale_color_manual(values = c("cyan", "gold", "red")) + theme_light(base_size = 20) + theme(axis.text.x = element_blank(),
axis.text.y = element_blank()) + theme(aspect.ratio = 1)
### UMAP direction
Here, we incorporate the MG and SR101 information to project the direction information into the UMAP
### calculate average stats per track
master_class <- left_join(master_cor2, umap_3[c(1:4)], by = c(Track2 = "Track2"))
master_class <- master_class %>%
filter(cluster2 != 0)
master_class$cluster2 <- as.numeric(master_class$cluster2)
master_class_sum <- master_class %>%
group_by(position, class, mouse, Track2) %>%
arrange(Time) %>%
summarize(cluster2 = mean(cluster2), V1 = mean(V1), V2 = mean(V2), dist_tumor = mean(dist_tumor),
dist_3_neigh = mean(dist_3_neigh), dist_10_neigh = mean(dist_10_neigh), speed = mean(speed),
disp2 = mean(disp2), disp_d = mean(disp_d), disp_l = mean(disp_l), movement = last(dist_tumor),
distance_to_tumor = last(dist_tumor_1), raw_dist_tumor = last(dist_tumor), Time = first(Time)) %>%
ungroup()
## Join the information on the SR101 and MG
master_class_sum <- left_join(master_class_sum, master_distance_MG[c("Track2", "n_MG", "min_MG")],
by = c(Track2 = "Track2"))
master_class_sum <- left_join(master_class_sum, master_distance_SR101[c("Track2", "n_SR101", "min_SR101")],
by = c(Track2 = "Track2"))
master_class_sum <- left_join(master_class_sum, BV_df_sum, by = c(Track2 = "Track2"))
mid <- 0
master_class_sum$movement2 <- ifelse(master_class_sum$movement > 0, "away from tumor", ifelse(master_class_sum$movement ==
0, "no movement", "towards the tumor"))
master_class_sum <- master_class_sum[!is.na(master_class_sum$cluster2), ]
saveRDS(master_class_sum, "master_class_sum.rds")
write.csv(master_class_sum, "master_class_sum.csv")
And here is the resulting plot:
ggplot(master_class_sum, mapping = aes(x = V1, y = V2, color = movement)) + geom_point(size = 2,
alpha = 0.8) + labs(color = "Direction") + xlab("") + ylab("") + ggtitle("Direction of movement relative to tumor core") +
theme_light(base_size = 20) + theme(axis.text.x = element_blank(), axis.text.y = element_blank()) +
theme(aspect.ratio = 1) + theme(aspect.ratio = 1) + scale_color_gradient2(midpoint = mid, low = "blue",
mid = "grey80", high = "red3")
In this section, we project the relevant clustering features in a Heatmap
### Create heatmap
sum_all <- master_class_sum %>%
group_by(cluster2) %>%
summarise(displacement2 = median(disp2), speed = median(speed), displacement_delta = median(disp_d),
displacement_length = median(disp_l), max_speed = quantile(speed, 0.75), persistance = (displacement_length/displacement_delta),
nearest_10_cell = median(dist_10_neigh), dist_tumor_core = median(distance_to_tumor), n_SR101 = mean(n_SR101,
na.rm = T), min_SR101 = mean(min_SR101, na.rm = T), n_MG = mean(n_MG, na.rm = T), min_MG = mean(min_MG,
na.rm = T), min_BV_dist = mean(BV_min, na.rm = T), sd_BV_dist = mean(BV_sd, na.rm = T),
mean_BV_dist = mean(BV_mean, na.rm = T), movement = median(movement))
Cluster_movement <- sum_all[, c(1, length(names(sum_all)))]
Cluster_movement <- Cluster_movement[order(Cluster_movement$movement), ]
stdize = function(x, ...) {
(x - min(x, ...))/(max(x, ...) - min(x, ...))
}
sum_all[-c(1)] <- lapply(sum_all[-c(1)], stdize, na.rm = T)
df <- sum_all[, -c(1)]
df <- df[order(match(rownames(df), Cluster_movement$cluster2)), , drop = FALSE] %>%
ungroup()
df <- as.data.frame(df)
rownames(df) <- Cluster_movement$cluster2
We show all the features extracted in the analysis:
heat_m <- pheatmap(df, clustering_method = "complete", cluster_cols = F, cluster_rows = F, cellwidth = 20,
cellheight = 20, treeheight_row = 0, treeheight_col = 0, fontsize = 8, na_col = "grey50", main = "Features",
angle_col = 90, color = colorRampPalette(c("deepskyblue1", "grey95", "deeppink2"))(50))
df_cl_features <- df[, c("displacement2", "speed", "max_speed", "displacement_delta", "displacement_length",
"persistance", "movement")]
rownames(df_cl_features) <- rownames(df)
And a heatmap of the features used for clustering:
pheatmap(df_cl_features, clustering_method = "complete", cluster_cols = F, cluster_rows = F, cellwidth = 20,
cellheight = 20, treeheight_row = 0, treeheight_col = 0, fontsize = 8, na_col = "grey50", main = "Features used for clustering",
angle_col = 90, color = colorRampPalette(c("deepskyblue1", "grey95", "deeppink2"))(50))
We also print tha Cluster movement data (towards or away from the
tumor):
print(Cluster_movement) #3 per cluster we can see the direction
## # A tibble: 7 × 2
## cluster2 movement
## <dbl> <dbl>
## 1 1 -2.77
## 2 7 -0.586
## 3 4 -0.180
## 4 6 0.786
## 5 2 0.980
## 6 5 1.95
## 7 3 8.60
Plots the clusters according to direction and other relevant
features
## Movement within the clusters in UMAP
## plot clusters giving them a color according to direction Convert df to a data frame Add
## cluster2 as a column
df1 <- as.data.frame(df)
df1 <- mutate(df1, cluster2 = row.names(df))
# Order df1 by movement and reorder cluster2 accordingly
df1 <- df1 %>%
arrange(movement) %>%
mutate(cluster2 = factor(cluster2, levels = unique(cluster2)))
## set first order clusters based on the heatmap:
UMAP representation with the behavioral clusters:
ggplot(master_class_sum, aes(x = V1, y = V2, color = as.factor(cluster2))) +
geom_point(size = 2, alpha = 1) +
labs(color = "Cluster") +
xlab("") +
ylab("") +
scale_color_manual(values = setNames(mypalette_1, levels(df1$cluster2))) + # Match palette to cluster2 levels ggtitle("UMAP Cluster") +
theme_light(base_size = 20) +
theme(axis.text.x = element_blank(),
axis.text.y = element_blank(),
aspect.ratio = 1)
UMAP representation with the behavioral clusters among mice:
ggplot(umap_3, aes(x = V1, y = V2, color = as.factor(cluster2))) + geom_point(size = 1, alpha = 0.8) +
labs(color = "cluster") + xlab("") + ylab("") + ggtitle("umap Cluster among mice") + scale_color_manual(values = setNames(mypalette_1,
levels(df1$cluster2))) + theme_light(base_size = 20) + theme(axis.text.x = element_blank(),
axis.text.y = element_blank()) + theme(aspect.ratio = 1) + facet_wrap(mouse ~ .)
UMAP representation with the localization bistribution of each
track:
ggplot(master_class_sum, mapping = aes(x = V1, y = V2, color = distance_to_tumor)) + geom_point(size = 2,
alpha = 0.6) + xlab("") + ylab("") + ggtitle("UMAP localization distribution") + theme_light(base_size = 20) +
theme(axis.text.x = element_blank(), axis.text.y = element_blank()) + theme(aspect.ratio = 1) +
scale_color_gradientn(colours = rev(RColorBrewer::brewer.pal(4, "Spectral")))
UMAP representation with the raw movement of each track:
ggplot(master_class_sum, mapping = aes(x = V1, y = V2, color = movement)) + geom_point(size = 2,
alpha = 0.6) + xlab("") + ylab("") + ggtitle("UMAP raw_movement") + theme_light(base_size = 20) +
theme(axis.text.x = element_blank(), axis.text.y = element_blank()) + theme(aspect.ratio = 1) +
scale_color_gradientn(colours = rev(RColorBrewer::brewer.pal(4, "Spectral")))
UMAP representation with the speed distribution of each track:
ggplot(master_class_sum, mapping = aes(x = V1, y = V2, color = speed)) + geom_point(size = 2, alpha = 0.6) +
xlab("") + ylab("") + ggtitle("UMAP speed distribution") + theme_light(base_size = 20) + theme(axis.text.x = element_blank(),
axis.text.y = element_blank()) + theme(aspect.ratio = 1) + scale_color_gradientn(colours = rev(RColorBrewer::brewer.pal(4,
"Spectral")))
UMAP representation with the squared displacement disitribution of each
track:
ggplot(master_class_sum, mapping = aes(x = V1, y = V2, color = disp2)) + geom_point(size = 2, alpha = 0.6) +
xlab("") + ylab("") + ggtitle("UMAP disp2distribution") + theme_light(base_size = 20) + theme(axis.text.x = element_blank(),
axis.text.y = element_blank()) + theme(aspect.ratio = 1) + scale_color_gradientn(colours = rev(RColorBrewer::brewer.pal(4,
"Spectral")))
UMAP representation with the displacement length of each track:
ggplot(master_class_sum, mapping = aes(x = V1, y = V2, color = disp_l)) + geom_point(size = 2, alpha = 0.6) +
xlab("") + ylab("") + ggtitle("UMAP disp_length") + theme_light(base_size = 20) + theme(axis.text.x = element_blank(),
axis.text.y = element_blank()) + theme(aspect.ratio = 1) + scale_color_gradientn(colours = rev(RColorBrewer::brewer.pal(4,
"Spectral")))
Provides a pie chart of the cluster distribution along the mice
experiments
### plot enrichment on edge and core:
Number_cell_exp <- umap_3 %>%
group_by(position, class, mouse) %>%
summarise(total_cell = n())
Percentage_clus <- left_join(Number_cell_exp, umap_3)
Percentage_clus <- Percentage_clus %>%
group_by(cluster2, position, class, mouse) %>%
summarise(total_cell = mean(total_cell), num_cluster = n()) %>%
mutate(percentage = num_cluster * 100/total_cell) %>%
ungroup()
Percentage_clus_2 <- Percentage_clus %>%
group_by(cluster2, position, class, mouse) %>%
summarise(se_percentage = sd(percentage)/sqrt(length(percentage)), percentage = mean(percentage))
### Plot circular map
Percentage_clus_2$cluster2 <- as.factor(Percentage_clus_2$cluster2)
Percentage_clus_2$cluster2 <- factor(Percentage_clus_2$cluster2, levels = levels(df1$cluster2))
Behavioral cluster distribution in each mouse
Per2 <- ggplot(Percentage_clus_2, aes(fill = as.factor(cluster2), y = percentage, x = "")) + geom_bar(stat = "identity",
position = "fill", width = 0.5) + coord_flip() + scale_y_reverse()
Per2 <- Per2 + facet_grid(. ~ mouse)
Per2 <- Per2 + theme_void() + scale_fill_manual(values = mypalette_1) + theme(strip.text.x = element_text(size = 8,
angle = 90))
Per2
Mouse distribution in eacch behavioral cluster
Per4 <- ggplot(Percentage_clus_2, aes(fill = as.factor(mouse), y = percentage, x = "")) + geom_bar(stat = "identity",
position = "fill", width = 0.5) + coord_flip() + scale_y_reverse()
Per4 <- Per4 + facet_grid(cluster2 ~ .)
Per4 <- Per4 + theme_void() + theme(strip.text.x = element_text(size = 8, angle = 90))
Per4
This section analyzes the differences between clusters for different features and provides statistical support in each case.
### plot differences of the values per cluster
master_class_sum <- as.data.frame(master_class_sum)
master_class_sum$cluster2 <- as.character(master_class_sum$cluster2)
df1$mean_movement <- df1$movement
master_class_sum2 <- left_join(master_class_sum, df1[, c("cluster2", "mean_movement")])
master_class_sum2$cluster2 = factor(master_class_sum2$cluster2, levels = df1$cluster2)
#### try an anova between clusters based speed
p_speed <- ggplot(master_class_sum2, aes(x = as.factor(cluster2), y = speed, group = cluster2, fill = cluster2)) +
geom_jitter(width = 0.2, alpha = 0.5) + geom_boxplot(outlier.colour = NA, alpha = 0.5) + ggtitle("speed per cluster") +
theme_classic() + theme(aspect.ratio = 0.7) + scale_fill_manual(values = mypalette_1) + coord_cartesian(ylim = c(0,
20))
a_speed <- aov(speed ~ cluster2, data = master_class_sum2)
#### try an anova between clusters based direction
p_direction <- ggplot(master_class_sum2, aes(x = as.factor(cluster2), y = movement, group = cluster2,
fill = cluster2)) + geom_jitter(width = 0.2, alpha = 0.5) + geom_boxplot(outlier.colour = NA,
alpha = 0.5) + ggtitle("speed per cluster") + theme_classic() + theme(aspect.ratio = 0.7) +
scale_fill_manual(values = mypalette_1) + coord_cartesian(ylim = c(-15, 20))
a_direction <- aov(movement ~ cluster2, data = master_class_sum2)
#### try an anova between clusters based on distance to dist_3_neigh
p_dist3_neigth <- ggplot(master_class_sum2, aes(x = as.factor(cluster2), y = dist_3_neigh, group = cluster2,
fill = cluster2)) + geom_jitter(width = 0.2, alpha = 0.5) + geom_boxplot(alpha = 0.5, outlier.colour = NA) +
ggtitle("dist 3 neigh per cluster") + theme_classic() + theme(aspect.ratio = 0.7) + scale_fill_manual(values = mypalette_1) +
coord_cartesian(ylim = c(10, 70))
a_dist_3_neigh <- aov(dist_3_neigh ~ cluster2, data = master_class_sum2)
#### try an anova between clusters based on distance to dist_10_neigh
p_dist_10_neigh <- ggplot(master_class_sum2, aes(x = as.factor(cluster2), y = dist_10_neigh, group = cluster2,
fill = cluster2)) + geom_jitter(width = 0.2, alpha = 0.5) + geom_boxplot(outlier.colour = NA,
alpha = 0.5) + ggtitle("dist 10 neigh per cluster") + theme_classic() + theme(aspect.ratio = 0.7) +
scale_fill_manual(values = mypalette_1) + coord_cartesian(ylim = c(20, 90))
a_dist_10_neigh <- aov(dist_10_neigh ~ cluster2, data = master_class_sum2)
#### try an anova between clusters based on dist to MG
p_min_MG <- ggplot(subset(master_class_sum2, !is.na(min_MG)), aes(x = as.factor(cluster2), y = min_MG,
group = cluster2, fill = cluster2)) + geom_jitter(width = 0.3, alpha = 0.5) + geom_boxplot(outlier.colour = NA,
alpha = 0.5) + ggtitle("distance to closest MG") + theme_classic() + theme(aspect.ratio = 0.7) +
scale_fill_manual(values = mypalette_1) + coord_cartesian(ylim = c(2, 45))
a_min_MG <- aov(min_MG ~ cluster2, data = subset(master_class_sum2, !is.na(min_MG)))
#### try an anova between clusters based on dist to SR101
p_min_SR101 <- ggplot(subset(master_class_sum2, !is.na(min_SR101)), aes(x = as.factor(cluster2),
y = min_SR101, group = cluster2, fill = cluster2)) + geom_jitter(width = 0.3, alpha = 0.5) +
geom_boxplot(outlier.colour = NA, alpha = 0.5) + ggtitle("distance to closest SR101") + theme_classic() +
theme(aspect.ratio = 0.7) + scale_fill_manual(values = mypalette_1) + coord_cartesian(ylim = c(0,
50))
a_min_SR101 <- aov(min_SR101 ~ cluster2, data = subset(master_class_sum2, !is.na(min_SR101)))
#### try an anova between clusters based on n of MG
p_n_MG <- ggplot(subset(master_class_sum2, !is.na(n_MG)), aes(x = as.factor(cluster2), y = n_MG,
group = cluster2, fill = cluster2)) + geom_jitter(width = 0.3, alpha = 0.5) + geom_boxplot(outlier.colour = NA,
alpha = 0.5) + ggtitle("n MG") + theme_classic() + theme(aspect.ratio = 0.7) + scale_fill_manual(values = mypalette_1) +
coord_cartesian(ylim = c(0, 15))
a_n_MG <- aov(n_MG ~ cluster2, data = subset(master_class_sum2, !is.na(min_MG)))
#### try an anova between clusters based on n of SR101
p_nSR101 <- ggplot(subset(master_class_sum2, !is.na(n_SR101)), aes(x = as.factor(cluster2), y = n_SR101,
group = cluster2, fill = cluster2)) + geom_jitter(width = 0.3, alpha = 0.5) + geom_boxplot(outlier.colour = NA,
alpha = 0.5) + ggtitle("n SR101") + theme_classic() + theme(aspect.ratio = 0.7) + scale_fill_manual(values = mypalette_1) +
coord_cartesian(ylim = c(0, 20))
a_nSR101 <- aov(n_SR101 ~ cluster2, data = subset(master_class_sum2, !is.na(n_SR101)))
#### try an anova between clusters based on mean distance to BV
p_BV_mean <- ggplot(subset(master_class_sum2, !is.na(BV_mean)), aes(x = as.factor(cluster2), y = BV_mean,
group = cluster2, fill = cluster2)) + geom_jitter(width = 0.3, alpha = 0.5) + geom_boxplot(outlier.colour = NA,
alpha = 0.5) + ggtitle("BV_mean") + theme_classic() + theme(aspect.ratio = 0.7) + scale_fill_manual(values = mypalette_1) +
coord_cartesian(ylim = c(0, 20))
a_BV_mean <- aov(BV_mean ~ cluster2, data = subset(master_class_sum2, !is.na(BV_mean)))
#### try an anova between clusters based on min distance to BV
p_BV_min <- ggplot(subset(master_class_sum2, !is.na(BV_min)), aes(x = as.factor(cluster2), y = BV_min,
group = cluster2, fill = cluster2)) + geom_jitter(width = 0.2, alpha = 0.5) + geom_boxplot(outlier.colour = NA,
alpha = 0.5) + ggtitle("BV_min") + theme_classic() + theme(aspect.ratio = 0.7) + scale_fill_manual(values = mypalette_1) +
coord_cartesian(ylim = c(0, 20))
a_BV_min <- aov(BV_min ~ cluster2, data = subset(master_class_sum2, !is.na(BV_min)))
#### try an anova between clusters based on contact to BV
p_BV_contact <- ggplot(subset(master_class_sum2, !is.na(BV_min)), aes(x = as.factor(cluster2), y = BV_contact,
group = cluster2, fill = cluster2)) + geom_jitter(width = 0.3, alpha = 0.5) + geom_boxplot(outlier.colour = NA,
alpha = 0.5) + ggtitle("BV_contact") + theme_classic() + scale_fill_manual(values = mypalette_1) +
theme(aspect.ratio = 0.7)
a_BV_contact <- aov(BV_contact ~ cluster2, data = subset(master_class_sum2, !is.na(BV_contact)))
#### try an anova between clusters based on sd distance to BV
p_BV_sd <- ggplot(subset(master_class_sum2, !is.na(BV_min)), aes(x = as.factor(cluster2), y = BV_sd,
group = cluster2, fill = cluster2)) + geom_jitter(width = 0.3, alpha = 0.5) + geom_boxplot(outlier.colour = NA,
alpha = 0.5) + ggtitle("Standart deviation distance") + theme_classic() + theme(aspect.ratio = 0.7) +
scale_fill_manual(values = mypalette_1) + coord_cartesian(ylim = c(0, 1.7))
a_BV_sd <- aov(BV_sd ~ cluster2, data = subset(master_class_sum2, !is.na(BV_min)))
Statistical information based on ANOVA and Tukey’s HSD (Honestly
Significant Difference) test:
summary(a_speed)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 4396 732.6 86.4 <2e-16 ***
## Residuals 974 8259 8.5
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_speed)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = speed ~ cluster2, data = master_class_sum2)
##
## $cluster2
## diff lwr upr p adj
## 7-1 -0.78537525 -1.7950989 0.2243484 0.2458651
## 4-1 -1.43519746 -2.2804303 -0.5899646 0.0000129
## 6-1 -1.36724890 -2.2314265 -0.5030713 0.0000687
## 2-1 5.28517554 4.1197006 6.4506505 0.0000000
## 5-1 1.11075630 0.2386033 1.9829093 0.0033633
## 3-1 5.31199878 4.0381500 6.5858476 0.0000000
## 4-7 -0.64982221 -1.7138560 0.4142116 0.5453741
## 6-7 -0.58187365 -1.6610178 0.4972705 0.6868864
## 2-7 6.07055080 4.7378539 7.4032477 0.0000000
## 5-7 1.89613156 0.8105901 2.9816730 0.0000062
## 3-7 6.09737403 4.6689343 7.5258138 0.0000000
## 6-4 0.06794856 -0.8591053 0.9950024 0.9999914
## 2-4 6.72037300 5.5075425 7.9332035 0.0000000
## 5-4 2.54595376 1.6114609 3.4804466 0.0000000
## 3-4 6.74719624 5.4298820 8.0645105 0.0000000
## 2-6 6.65242444 5.4263159 7.8785330 0.0000000
## 5-6 2.47800520 1.5263429 3.4296675 0.0000000
## 3-6 6.67924768 5.3496985 8.0087969 0.0000000
## 5-2 -4.17441924 -5.4061620 -2.9426765 0.0000000
## 3-2 0.02682324 -1.5156521 1.5692985 1.0000000
## 3-5 4.20124247 2.8664956 5.5359893 0.0000000
summary(a_direction)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 8010 1335.0 138.2 <2e-16 ***
## Residuals 974 9406 9.7
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_direction)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = movement ~ cluster2, data = master_class_sum2)
##
## $cluster2
## diff lwr upr p adj
## 7-1 1.6666646 0.58913889 2.7441902 0.0001120
## 4-1 3.3088156 2.40682612 4.2108051 0.0000000
## 6-1 3.4597360 2.53752973 4.3819424 0.0000000
## 2-1 1.2692124 0.02547688 2.5129480 0.0419620
## 5-1 4.8981440 3.96742665 5.8288613 0.0000000
## 3-1 12.2319988 10.87261218 13.5913854 0.0000000
## 4-7 1.6421510 0.50666839 2.7776337 0.0004230
## 6-7 1.7930715 0.64146377 2.9446792 0.0000973
## 2-7 -0.3974521 -1.81963846 1.0247342 0.9823139
## 5-7 3.2314794 2.07304487 4.3899139 0.0000000
## 3-7 10.5653342 9.04097607 12.0896924 0.0000000
## 6-4 0.1509204 -0.83838426 1.1402251 0.9993683
## 2-4 -2.0396032 -3.33387417 -0.7453322 0.0000750
## 5-4 1.5893284 0.59208515 2.5865716 0.0000584
## 3-4 8.9231832 7.51741249 10.3289539 0.0000000
## 2-6 -2.1905236 -3.49896423 -0.8820830 0.0000184
## 5-6 1.4384079 0.42284234 2.4539735 0.0006171
## 3-6 8.7722627 7.35343553 10.1910900 0.0000000
## 5-2 3.6289315 2.31447839 4.9433847 0.0000000
## 3-2 10.9627864 9.31673524 12.6088375 0.0000000
## 3-5 7.3338548 5.90948096 8.7582287 0.0000000
summary(a_dist_3_neigh)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 4102 683.6 3.107 0.00509 **
## Residuals 974 214307 220.0
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_dist_3_neigh)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = dist_3_neigh ~ cluster2, data = master_class_sum2)
##
## $cluster2
## diff lwr upr p adj
## 7-1 0.01754239 -5.1258637 5.160949 1.0000000
## 4-1 2.82930265 -1.4762080 7.134813 0.4532049
## 6-1 2.85341888 -1.5485937 7.255431 0.4706551
## 2-1 7.57868723 1.6419034 13.515471 0.0032524
## 5-1 3.45271062 -0.9899280 7.895349 0.2467792
## 3-1 2.04498274 -4.4438440 8.533809 0.9675264
## 4-7 2.81176025 -2.6082948 8.231815 0.7249982
## 6-7 2.83587649 -2.6611491 8.332902 0.7302294
## 2-7 7.56114484 0.7725531 14.349737 0.0178448
## 5-7 3.43516823 -2.0944441 8.964781 0.5242611
## 3-7 2.02744035 -5.2488531 9.303734 0.9825831
## 6-4 0.02411623 -4.6981803 4.746413 1.0000000
## 2-4 4.74938459 -1.4286226 10.927392 0.2590727
## 5-4 0.62340798 -4.1367819 5.383598 0.9997383
## 3-4 -0.78431991 -7.4945540 5.925914 0.9998651
## 2-6 4.72526835 -1.5203754 10.970912 0.2775223
## 5-6 0.59929174 -4.2483572 5.446941 0.9998129
## 3-6 -0.80843614 -7.5809936 5.964121 0.9998475
## 5-2 -4.12597661 -10.4003203 2.148367 0.4523113
## 3-2 -5.53370449 -13.3908810 2.323472 0.3649053
## 3-5 -1.40772788 -8.2067615 5.391306 0.9964572
summary(a_dist_10_neigh)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 6625 1104.1 2.662 0.0144 *
## Residuals 974 403908 414.7
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_dist_10_neigh)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = dist_10_neigh ~ cluster2, data = master_class_sum2)
##
## $cluster2
## diff lwr upr p adj
## 7-1 0.5390424 -6.5220820 7.600167 0.9999891
## 4-1 2.6742951 -3.2365247 8.585115 0.8345260
## 6-1 4.3742336 -1.6690688 10.417536 0.3307890
## 2-1 8.8396869 0.6893739 16.990000 0.0235533
## 5-1 5.0215581 -1.0775177 11.120634 0.1861065
## 3-1 5.7174464 -3.1907385 14.625631 0.4832455
## 4-7 2.1352527 -5.3056692 9.576175 0.9797377
## 6-7 3.8351912 -3.7113996 11.381782 0.7440315
## 2-7 8.3006444 -1.0190729 17.620362 0.1177185
## 5-7 4.4825157 -3.1088119 12.073843 0.5860585
## 3-7 5.1784040 -4.8108546 15.167663 0.7256775
## 6-4 1.6999385 -4.7830657 8.182943 0.9873086
## 2-4 6.1653918 -2.3160846 14.646868 0.3255126
## 5-4 2.3472630 -4.1877630 8.882289 0.9391248
## 3-4 3.0431513 -6.1689927 12.255295 0.9591091
## 2-6 4.4654532 -4.1088780 13.039784 0.7212971
## 5-6 0.6473245 -6.0077698 7.302419 0.9999541
## 3-6 1.3432128 -7.9544919 10.640918 0.9995382
## 5-2 -3.8181288 -12.4318607 4.795603 0.8475663
## 3-2 -3.1222404 -13.9089642 7.664483 0.9788229
## 3-5 0.6958883 -8.6381641 10.029941 0.9999905
summary(a_min_MG)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 2406 401 2.823 0.0102 *
## Residuals 586 83222 142
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_min_MG)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = min_MG ~ cluster2, data = subset(master_class_sum2, !is.na(min_MG)))
##
## $cluster2
## diff lwr upr p adj
## 7-1 -2.70009641 -7.897292 2.4970992 0.7222555
## 4-1 -2.19021452 -7.069143 2.6887144 0.8385902
## 6-1 -2.87138117 -7.498036 1.7552736 0.5240296
## 2-1 -6.51903163 -12.427182 -0.6108816 0.0197966
## 5-1 -2.78033015 -7.148591 1.5879312 0.4923842
## 3-1 -6.78499267 -13.201425 -0.3685600 0.0302985
## 4-7 0.50988190 -5.312614 6.3323778 0.9999750
## 6-7 -0.17128475 -5.784078 5.4415083 1.0000000
## 2-7 -3.81893522 -10.527418 2.8895480 0.6269088
## 5-7 -0.08023373 -5.482013 5.3215454 1.0000000
## 3-7 -4.08489625 -11.245072 3.0752797 0.6244649
## 6-4 -0.68116665 -6.000617 4.6382836 0.9997677
## 2-4 -4.32881711 -10.793866 2.1362314 0.4278908
## 5-4 -0.59011563 -5.686420 4.5061892 0.9998708
## 3-4 -4.59477815 -11.527398 2.3378422 0.4409182
## 2-6 -3.64765047 -9.924500 2.6291991 0.6033035
## 5-6 0.09105102 -4.764287 4.9463894 1.0000000
## 3-6 -3.91361150 -10.671068 2.8438445 0.6072444
## 5-2 3.73870148 -2.350191 9.8275940 0.5372617
## 3-2 -0.26596104 -7.957743 7.4258212 0.9999999
## 3-5 -4.00466252 -10.587898 2.5785726 0.5487295
summary(a_min_SR101)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 1563 260.6 1.619 0.14
## Residuals 381 61311 160.9
TukeyHSD(a_min_SR101)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = min_SR101 ~ cluster2, data = subset(master_class_sum2, !is.na(min_SR101)))
##
## $cluster2
## diff lwr upr p adj
## 7-1 1.8361891 -5.4820643 9.154442 0.9897048
## 4-1 4.6631659 -0.6949473 10.021279 0.1351299
## 6-1 4.0937335 -1.7782677 9.965735 0.3748810
## 2-1 5.2990323 -3.4047456 14.002810 0.5457186
## 5-1 4.7717052 -2.0412179 11.584628 0.3690700
## 3-1 2.2523708 -7.3825575 11.887299 0.9929490
## 4-7 2.8269768 -4.5012266 10.155180 0.9141573
## 6-7 2.2575444 -5.4543676 9.969456 0.9770703
## 2-7 3.4628433 -6.5745338 13.500220 0.9486322
## 5-7 2.9355161 -5.5148750 11.385907 0.9469299
## 3-7 0.4161817 -10.4385422 11.270906 0.9999998
## 6-4 -0.5694324 -6.4538297 5.314965 0.9999542
## 2-4 0.6358664 -8.0762793 9.348012 0.9999914
## 5-4 0.1085393 -6.7150708 6.932149 1.0000000
## 3-4 -2.4107951 -12.0532831 7.231693 0.9898966
## 2-6 1.2052988 -7.8319853 10.242583 0.9997011
## 5-6 0.6779717 -6.5561610 7.912104 0.9999621
## 3-6 -1.8413627 -11.7785956 8.095870 0.9980414
## 5-2 -0.5273272 -10.2024513 9.147797 0.9999985
## 3-2 -3.0466616 -14.8798331 8.786510 0.9881985
## 3-5 -2.5193344 -13.0399864 8.001318 0.9919743
summary(a_n_MG)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 80 13.29 1.318 0.247
## Residuals 586 5908 10.08
TukeyHSD(a_n_MG)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = n_MG ~ cluster2, data = subset(master_class_sum2, !is.na(min_MG)))
##
## $cluster2
## diff lwr upr p adj
## 7-1 0.15849325 -1.2262713 1.543258 0.9998793
## 4-1 0.28747795 -1.0124862 1.587442 0.9948575
## 6-1 0.34226190 -0.8904852 1.575009 0.9827356
## 2-1 0.47922999 -1.0949645 2.053425 0.9724161
## 5-1 0.24664225 -0.9172573 1.410542 0.9959296
## 3-1 1.58897243 -0.1206513 3.298596 0.0881023
## 4-7 0.12898471 -1.4223878 1.680357 0.9999816
## 6-7 0.18376866 -1.3117296 1.679267 0.9998172
## 2-7 0.32073674 -1.4667022 2.108176 0.9983899
## 5-7 0.08814900 -1.3511258 1.527424 0.9999970
## 3-7 1.43047918 -0.4773109 3.338269 0.2871225
## 6-4 0.05478395 -1.3625547 1.472123 0.9999998
## 2-4 0.19175204 -1.5308251 1.914329 0.9998974
## 5-4 -0.04083571 -1.3987185 1.317047 1.0000000
## 3-4 1.30149448 -0.5456646 3.148654 0.3631688
## 2-6 0.13696809 -1.5354644 1.809401 0.9999832
## 5-6 -0.09561966 -1.3892982 1.198059 0.9999909
## 3-6 1.24671053 -0.5537770 3.047198 0.3851439
## 5-2 -0.23258774 -1.8549401 1.389765 0.9995530
## 3-2 1.10974244 -0.9396912 3.159176 0.6811010
## 3-5 1.34233018 -0.4117371 3.096397 0.2633964
summary(a_nSR101)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 278 46.41 3.19 0.00456 **
## Residuals 381 5542 14.55
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_nSR101)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = n_SR101 ~ cluster2, data = subset(master_class_sum2, !is.na(n_SR101)))
##
## $cluster2
## diff lwr upr p adj
## 7-1 -1.45959596 -3.659836 0.7406436 0.4379301
## 4-1 -1.93351886 -3.544440 -0.3225972 0.0076279
## 6-1 -1.54372294 -3.309145 0.2216996 0.1313095
## 2-1 -1.55862978 -4.175429 1.0581690 0.5722359
## 5-1 -2.42424242 -4.472554 -0.3759309 0.0090591
## 3-1 -0.57070707 -3.467457 2.3260429 0.9972406
## 4-7 -0.47392290 -2.677154 1.7293081 0.9955166
## 6-7 -0.08412698 -2.402720 2.2344664 0.9999999
## 2-7 -0.09903382 -3.116780 2.9187128 0.9999999
## 5-7 -0.96464646 -3.505264 1.5759713 0.9200414
## 3-7 0.88888889 -2.374594 4.1523715 0.9841590
## 6-4 0.38979592 -1.379353 2.1589453 0.9948872
## 2-4 0.37488909 -2.244425 2.9942037 0.9995503
## 5-4 -0.49072356 -2.542248 1.5608010 0.9920220
## 3-4 1.36281179 -1.536211 4.2618346 0.8052200
## 2-6 -0.01490683 -2.731975 2.7021609 1.0000000
## 5-6 -0.88051948 -3.055468 1.2944291 0.8939402
## 3-6 0.97301587 -2.014622 3.9606540 0.9610436
## 5-2 -0.86561265 -3.774448 2.0432223 0.9750862
## 3-2 0.98792271 -2.569731 4.5455765 0.9824934
## 3-5 1.85353535 -1.309508 5.0165790 0.5915556
summary(a_BV_mean)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 68 11.36 0.323 0.925
## Residuals 625 22004 35.21
TukeyHSD(a_BV_mean)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = BV_mean ~ cluster2, data = subset(master_class_sum2, !is.na(BV_mean)))
##
## $cluster2
## diff lwr upr p adj
## 7-1 -0.26577225 -2.790540 2.258995 0.9999263
## 4-1 -0.13978253 -2.188181 1.908616 0.9999943
## 6-1 -0.10340353 -2.264545 2.057738 0.9999993
## 2-1 -0.08589028 -3.033728 2.861947 1.0000000
## 5-1 0.35652916 -2.010124 2.723182 0.9994065
## 3-1 -1.35613935 -4.836421 2.124142 0.9112516
## 4-7 0.12598973 -2.497241 2.749221 0.9999993
## 6-7 0.16236872 -2.549815 2.874552 0.9999974
## 2-7 0.17988197 -3.192825 3.552589 0.9999987
## 5-7 0.62230141 -2.256319 3.500922 0.9954657
## 3-7 -1.09036710 -4.937154 2.756419 0.9808229
## 6-4 0.03637899 -2.239016 2.311774 1.0000000
## 2-4 0.05389225 -2.978704 3.086488 1.0000000
## 5-4 0.49631168 -1.975113 2.967736 0.9969812
## 3-4 -1.21635682 -4.768715 2.336002 0.9510608
## 2-6 0.01751325 -3.092348 3.127374 1.0000000
## 5-6 0.45993269 -2.105713 3.025578 0.9984014
## 3-6 -1.25273581 -4.871278 2.365806 0.9484331
## 5-2 0.44241943 -2.813614 3.698452 0.9996728
## 3-2 -1.27024907 -5.407043 2.866545 0.9712444
## 3-5 -1.71266850 -5.457580 2.032243 0.8264509
summary(a_BV_min)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 312 51.96 1.826 0.0916 .
## Residuals 625 17782 28.45
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_BV_min)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = BV_min ~ cluster2, data = subset(master_class_sum2, !is.na(BV_min)))
##
## $cluster2
## diff lwr upr p adj
## 7-1 -0.18579179 -2.455449 2.0838654 0.9999833
## 4-1 0.75936189 -1.082060 2.6007836 0.8864390
## 6-1 0.55907650 -1.383697 2.5018496 0.9792776
## 2-1 -1.62580107 -4.275780 1.0241781 0.5385067
## 5-1 0.06627626 -2.061243 2.1937957 0.9999999
## 3-1 -1.65308922 -4.781712 1.4755340 0.7061094
## 4-7 0.94515369 -1.413018 3.3033255 0.8995563
## 6-7 0.74486829 -1.693268 3.1830044 0.9719640
## 2-7 -1.44000928 -4.471928 1.5919092 0.7992332
## 5-7 0.25206806 -2.335688 2.8398237 0.9999533
## 3-7 -1.46729742 -4.925393 1.9907980 0.8719077
## 6-4 -0.20028539 -2.245768 1.8451970 0.9999518
## 2-4 -2.38516296 -5.111336 0.3410103 0.1313725
## 5-4 -0.69308563 -2.914790 1.5286186 0.9688877
## 3-4 -2.41245111 -5.605868 0.7809662 0.2784772
## 2-6 -2.18487757 -4.980509 0.6107536 0.2398941
## 5-6 -0.49280023 -2.799205 1.8136045 0.9957478
## 3-6 -2.21216571 -5.465079 1.0407480 0.4082676
## 5-2 1.69207734 -1.234956 4.6191109 0.6096086
## 3-2 -0.02728814 -3.746088 3.6915114 1.0000000
## 3-5 -1.71936548 -5.085880 1.6471489 0.7384177
summary(a_BV_contact)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 0.85 0.1413 0.871 0.516
## Residuals 625 101.37 0.1622
TukeyHSD(a_BV_contact)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = BV_contact ~ cluster2, data = subset(master_class_sum2, !is.na(BV_contact)))
##
## $cluster2
## diff lwr upr p adj
## 7-1 -0.0242750806 -0.19563914 0.14708898 0.9995836
## 4-1 -0.0229229275 -0.16195428 0.11610842 0.9990053
## 6-1 -0.0009914992 -0.14767510 0.14569210 1.0000000
## 2-1 -0.0178269590 -0.21790616 0.18225224 0.9999724
## 5-1 0.0761225829 -0.08450977 0.23675493 0.8009201
## 3-1 -0.0924266186 -0.32864449 0.14379125 0.9096130
## 4-7 0.0013521532 -0.17669496 0.17939926 1.0000000
## 6-7 0.0232835814 -0.16080100 0.20736817 0.9997841
## 2-7 0.0064481216 -0.22246833 0.23536457 1.0000000
## 5-7 0.1003976635 -0.09498352 0.29577885 0.7327874
## 3-7 -0.0681515380 -0.32924527 0.19294220 0.9875043
## 6-4 0.0219314282 -0.13250695 0.17636980 0.9995777
## 2-4 0.0050959684 -0.20073605 0.21092798 1.0000000
## 5-4 0.0990455103 -0.06869800 0.26678902 0.5848464
## 3-4 -0.0695036912 -0.31061366 0.17160627 0.9790956
## 2-6 -0.0168354598 -0.22791170 0.19424078 0.9999857
## 5-6 0.0771140821 -0.09702450 0.25125267 0.8473359
## 3-6 -0.0914351194 -0.33703719 0.15416695 0.9277082
## 5-2 0.0939495419 -0.12704786 0.31494695 0.8708864
## 3-2 -0.0745996596 -0.35537712 0.20617780 0.9862871
## 3-5 -0.1685492015 -0.42272837 0.08562997 0.4404848
summary(a_BV_sd)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 7.80 1.3002 12.02 8.36e-13 ***
## Residuals 625 67.59 0.1081
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_BV_sd)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = BV_sd ~ cluster2, data = subset(master_class_sum2, !is.na(BV_min)))
##
## $cluster2
## diff lwr upr p adj
## 7-1 -0.007171339 -0.14710581 0.13276313 0.9999990
## 4-1 -0.125418351 -0.23895020 -0.01188650 0.0195411
## 6-1 -0.138910342 -0.25869095 -0.01912973 0.0114031
## 2-1 0.264153311 0.10077030 0.42753632 0.0000443
## 5-1 0.056369400 -0.07480164 0.18754044 0.8649533
## 3-1 0.123263336 -0.06963021 0.31615689 0.4875780
## 4-7 -0.118247012 -0.26363880 0.02714478 0.1977911
## 6-7 -0.131739002 -0.28206095 0.01858294 0.1300277
## 2-7 0.271324650 0.08439338 0.45825592 0.0004068
## 5-7 0.063540739 -0.09600592 0.22308739 0.9023048
## 3-7 0.130434676 -0.08277230 0.34364165 0.5420015
## 6-4 -0.013491991 -0.13960508 0.11262110 0.9999189
## 2-4 0.389571662 0.22149095 0.55765238 0.0000000
## 5-4 0.181787751 0.04480979 0.31876571 0.0018504
## 3-4 0.248681687 0.05179329 0.44557008 0.0038276
## 2-6 0.403063652 0.23070055 0.57542676 0.0000000
## 5-6 0.195279742 0.05307962 0.33747986 0.0010715
## 3-6 0.262173678 0.06161707 0.46273029 0.0023283
## 5-2 -0.207783911 -0.38824856 -0.02731927 0.0123953
## 3-2 -0.140889974 -0.37017052 0.08839057 0.5365537
## 3-5 0.066893937 -0.14066666 0.27445453 0.9634729
The output in this section shows a boxplot for each feature in each
behavioral cluster
p_speed
p_direction
p_dist3_neigth
p_dist_10_neigh
p_min_MG
p_min_SR101
p_n_MG
p_nSR101
p_BV_mean
p_BV_min
p_BV_contact
p_BV_sd
Select a position of interest to save .txt information
about the specified position of interest in a .txt
file.
This code section uses the master_distance.rds file
generated in the File Import section
# Determine position of interest
position_interest <- "2430F11.CL3_1"
names_cell <- master_class[!duplicated(master_class$mouse), ]
names_cell$position
## [1] 2430F11.CL3_1 2430F13.CL2_2 2430F16.CL3_1 2430F17.CL3_2 2430F18.CL2_1
## [6] 2430F12.CL2_2
## 36 Levels: 2430F11.CL1_1 2430F12.CL1_1 2430F13.CL1_1 ... 2430F18.CL3_2
Tracks_1 <- master_class[which(master_class$position == position_interest), ]
Tracks_1 <- Tracks_1[!duplicated(Tracks_1$Track2), c(2, which(colnames(master_class) == "cluster2"))]
## Join with information on unique TrackID for that experiment
# Generated during the data import
master_distance <- readRDS("D:/BEHAV3D_Tumor_Profiler/results/master_distance.rds")
master_distance <- master_distance[c("TrackID", "Track2")] %>%
distinct(TrackID, .keep_all = TRUE)
Tracks_1 <- left_join(Tracks_1, master_distance)
Tracks_1$Track2 <- NULL
Track_1_list <- split(Tracks_1, Tracks_1$cluster2)
write(paste(as.character(Track_1_list), sep = "' '", collapse = ", "), paste0(position_interest,
".txt"))
Then, we plot the cell trajectories combined with the behavioral cluster information:
### Plot cell trajectories:
## bind cluster information data to original dataset:
Tracks_1 <- master_class
### plot all the tracks
Tracks_1 <- Tracks_1[!duplicated(Tracks_1$Track2), c("cluster2", "Track2")]
Tracks_1$cluster2 <- as.factor(Tracks_1$cluster2)
master_traject <- left_join(master_cor2, Tracks_1)
master_traject <- na.omit(master_traject)
master_traject <- subset(master_traject, cluster2 != 0)
master_traject <- master_traject %>%
group_by(position, mouse) %>%
mutate(x_new = x - min(x), y_new = y - min(y))
master_traject$cluster2 = factor(master_traject$cluster2, levels = df1$cluster2)
g_all_track <- ggplot(master_traject) + geom_path(aes(x_new, y_new, group = Track2, color = as.factor(cluster2)),
size = 0.2, arrow = arrow(ends = "last", type = "open", length = unit(0.01, "inches"))) + scale_color_manual(values = mypalette_1) +
theme_bw() + theme(aspect.ratio = 1) + facet_wrap(class ~ position)
g_all_track
g_track_int1 <- ggplot(subset(master_traject, position %in% position_interest)) + geom_path(aes(x_new,
y_new, group = Track2, color = as.factor(cluster2)), size = 0.5, arrow = arrow(ends = "last",
type = "open", length = unit(0, "inches"))) + scale_color_manual(values = mypalette_1) + ggtitle(paste0(position_interest)) +
theme_bw() + theme(aspect.ratio = 1)
g_track_int1
g_track_int2 <- ggplot(subset(master_traject, position %in% position_interest)) + geom_path(aes(x_new,
y_new, group = Track2), size = 0.5, arrow = arrow(ends = "last", type = "open", length = unit(0,
"inches"))) + scale_color_manual(values = mypalette_1) + ggtitle(paste0(position_interest)) +
theme_bw() + theme(aspect.ratio = 1)
g_track_int2
Dynamic Time Warping (DTW) is an algorithm used to measure similarity between two time series by aligning them in a way that minimizes the distance between corresponding points. It is particularly useful for comparing time series that may vary in speed or timing, like on this case.
If the matrix_distmat.rds file exists, it will skip the
DTW analysis time, as it has been run previously. Else, it will run
normally and generate the file at the end.
### MULTIVARIATE
list_master_pca <- split(master_pca3[, -c(1)], master_pca3$Track2) ## split into list
if (!file.exists("matrix_distmat.rds")) {
## parallel working load parallel
library(parallel)
# create multi-process workers
workers <- makeCluster(detectCores() - 2)
# load dtwclust in each one, and make them use 1 thread per worker
invisible(clusterEvalQ(workers, {
library(dtwclust)
RcppParallel::setThreadOptions(1L)
}))
# register your workers, e.g. with doParallel
require(doParallel)
registerDoParallel(workers)
### MULTIVARIATE
set.seed(123)
distmat <- proxy::dist(list_master_pca, method = "dtw")
saveRDS(distmat, file = "distmat.rds")
matrix_distmat <- as.matrix(distmat)
saveRDS(matrix_distmat, file = "matrix_distmat.rds")
} else {
matrix_distmat <- readRDS("matrix_distmat.rds")
}
Track2 <- as.numeric(names(list_master_pca))
Now, we can perform the UMAP representation and cluster in the desired number of clusters. Any of the parameters can be modified to your convenience
# Modify the parameters according to the characteristics of the experiment
umap_dist <- umap(matrix_distmat, n_components = 2, input = "dist", init = "random", n_neighbors = 7,
min_dist = 0.5, spread = 6, n_epochs = 1000, local_connectivity = 1)
umap_1 <- as.data.frame(umap_dist$layout)
Here, we can see the Raw UMAP plot:
plot(umap_dist$`layout`, # Plot the result
col=rgb(0,0,0,alpha=0.1),
pch=19,
asp=0.4, xlab="UMAP 1", ylab="UMAP 2",
main="Raw UMAP")
Here we determine the number of clusters for the analysis.
(n_cluster)
umap_1 <- as.data.frame(umap_dist$layout)
Track2_umap <- cbind(Track2, umap_1)
positiontype <- master_norm[, c("Track2", "position", "class", "mouse")]
positiontype <- positiontype[!duplicated(positiontype$Track2), ]
umap_2 <- left_join(Track2_umap, positiontype)
# Clustering
n_cluster = 7
hc <- hclust(dist(umap_dist$layout, method = "euclidean"), method = "ward.D2")
hierarchical_clusters <- cutree(hc, k = n_cluster)
umap_3 <- cbind(hierarchical_clusters, umap_2)
colnames(umap_3)[1] <- "cluster2"
UMAP positions colored by large-scale regions
ggplot(umap_3, aes(x = V1, y = V2, color = as.factor(class))) + geom_point(size = 0.5, alpha = 0.8) +
labs(color = "Environment_cluster") + xlab("") + ylab("") + ggtitle("umap Cluster") + scale_color_manual(values = c("cyan",
"gold", "red")) + theme_light(base_size = 20) + theme(axis.text.x = element_blank(), axis.text.y = element_blank()) +
theme(aspect.ratio = 1) + facet_wrap(mouse ~ position)
UMAP colored by large-scale regions
ggplot(umap_3, aes(x = V1, y = V2, color = as.factor(class))) + geom_point(size = 2, alpha = 0.6) +
labs(color = "Tumor region phenotype") + xlab("") + ylab("") + ggtitle("distribution of large scale regions") +
scale_color_manual(values = c("cyan", "gold", "red")) + theme_light(base_size = 20) + theme(axis.text.x = element_blank(),
axis.text.y = element_blank()) + theme(aspect.ratio = 1)
### calculate average stats per track
master_class <- left_join(master_cor2, umap_3[c(1:4)], by = c(Track2 = "Track2"))
master_class <- master_class %>%
filter(cluster2 != 0)
master_class$cluster2 <- as.numeric(master_class$cluster2)
master_class_sum <- master_class %>%
group_by(position, class, mouse, Track2) %>%
arrange(Time) %>%
summarize(cluster2 = mean(cluster2), V1 = mean(V1), V2 = mean(V2), dist_tumor = mean(dist_tumor),
dist_3_neigh = mean(dist_3_neigh), dist_10_neigh = mean(dist_10_neigh), speed = mean(speed),
disp2 = mean(disp2), disp_d = mean(disp_d), disp_l = mean(disp_l), movement = last(dist_tumor),
distance_to_tumor = last(dist_tumor_1), raw_dist_tumor = last(dist_tumor), Time = first(Time)) %>%
ungroup()
## Join the information on the SR101 and MG
master_class_sum <- left_join(master_class_sum, master_distance_MG[c("Track2", "n_MG", "min_MG")],
by = c(Track2 = "Track2"))
master_class_sum <- left_join(master_class_sum, master_distance_SR101[c("Track2", "n_SR101", "min_SR101")],
by = c(Track2 = "Track2"))
master_class_sum <- left_join(master_class_sum, BV_df_sum, by = c(Track2 = "Track2"))
mid <- 0
master_class_sum$movement2 <- ifelse(master_class_sum$movement > 0, "away from tumor", ifelse(master_class_sum$movement ==
0, "no movement", "towards the tumor"))
master_class_sum <- master_class_sum[!is.na(master_class_sum$cluster2), ]
saveRDS(master_class_sum, "master_class_sum.rds")
write.csv(master_class_sum, "master_class_sum.csv")
Provides a pie chart of the cluster distribution along the mice experiments
sum_all <- master_class_sum %>%
group_by(cluster2) %>%
summarise(displacement2 = median(disp2), speed = median(speed), displacement_delta = median(disp_d),
displacement_length = median(disp_l), max_speed = quantile(speed, 0.75), persistance = (displacement_length/displacement_delta),
nearest_10_cell = median(dist_10_neigh), dist_tumor_core = median(distance_to_tumor), n_SR101 = mean(n_SR101,
na.rm = T), min_SR101 = mean(min_SR101, na.rm = T), n_MG = mean(n_MG, na.rm = T), min_MG = mean(min_MG,
na.rm = T), min_BV_dist = mean(BV_min, na.rm = T), sd_BV_dist = mean(BV_sd, na.rm = T),
mean_BV_dist = mean(BV_mean, na.rm = T), movement = median(movement))
Cluster_movement <- sum_all[, c(1, length(names(sum_all)))]
Cluster_movement <- Cluster_movement[order(Cluster_movement$movement), ]
stdize = function(x, ...) {
(x - min(x, ...))/(max(x, ...) - min(x, ...))
}
sum_all[-c(1)] <- lapply(sum_all[-c(1)], stdize, na.rm = T)
df <- sum_all[, -c(1)]
df <- df[order(match(rownames(df), Cluster_movement$cluster2)), , drop = FALSE] %>%
ungroup()
df <- as.data.frame(df)
rownames(df) <- Cluster_movement$cluster2
## plot clusters giving them a color according to direction Convert df to a data frame Add
## cluster2 as a column
df1 <- as.data.frame(df)
df1 <- mutate(df1, cluster2 = row.names(df))
# Order df1 by movement and reorder cluster2 accordingly
df1 <- df1 %>%
arrange(movement) %>%
mutate(cluster2 = factor(cluster2, levels = unique(cluster2)))
# Make a color palette that adjusts to the cluster direction Generate a continuous palette
# with 100 colors
continuous_palette <- colorRampPalette(c("cyan4", "darkturquoise", "darkorchid4", "darkorchid1",
"deeppink4", "deeppink1", "goldenrod3", "gold"))(50)
# Select 8 colors from the continuous palette
mypalette_1 <- continuous_palette[seq(1, length(continuous_palette), length.out = n_cluster)]
### plot enrichment on edge and core:
Number_cell_exp <- umap_3 %>%
group_by(position, class, mouse) %>%
summarise(total_cell = n())
Percentage_clus <- left_join(Number_cell_exp, umap_3)
Percentage_clus <- Percentage_clus %>%
group_by(cluster2, position, class, mouse) %>%
summarise(total_cell = mean(total_cell), num_cluster = n()) %>%
mutate(percentage = num_cluster * 100/total_cell) %>%
ungroup()
Percentage_clus_2 <- Percentage_clus %>%
group_by(cluster2, position, class, mouse) %>%
summarise(se_percentage = sd(percentage)/sqrt(length(percentage)), percentage = mean(percentage))
Plots the pie chart of:
### Plot circular map
Percentage_clus_2$cluster2 <- as.factor(Percentage_clus_2$cluster2)
Percentage_clus_2$cluster2 <- factor(Percentage_clus_2$cluster2, levels = levels(df1$cluster2))
Per1 <- ggplot(Percentage_clus_2, aes(fill = as.factor(cluster2), y = percentage, x = "")) + geom_bar(stat = "identity",
position = "fill", width = 0.5) + coord_flip() + scale_y_reverse()
Per1 <- Per1 + facet_grid(class ~ mouse)
Per1 <- Per1 + theme_void() + scale_fill_manual(values = mypalette_1) + theme(strip.text.x = element_text(size = 8,
angle = 90))
Per1
- Behavioral cluster distribution in large-scale regions
Per3 <- ggplot(Percentage_clus_2, aes(fill = as.factor(cluster2), y = percentage, x = "")) + geom_bar(stat = "identity",
position = "fill", width = 0.5) + coord_flip() + scale_y_reverse()
Per3 <- Per3 + facet_grid(class ~ .)
Per3 <- Per3 + theme_void() + theme(strip.text.x = element_text(size = 8, angle = 90)) + scale_fill_manual(values = mypalette_1)
Per3
# Additional Statistics
# Create an empty dataframe to store results
results_df <- data.frame(cluster2 = character(), p_value = numeric(), contrast = character(), pairwise_p_values = numeric())
# Create an empty list to store plots
plots_list <- list()
# Iterate over unique clusters
for (cluster in unique(Percentage_clus_2$cluster2)) {
# Subset data for the current cluster
cluster_data <- subset(Percentage_clus_2, cluster2 == cluster)
# Calculate mean percentage for each mouse and subtract from original percentage
normalized_data <- cluster_data %>%
group_by(mouse) %>%
mutate(normalized_percentage = (percentage - mean(percentage))/sd(percentage)) %>%
ungroup()
# Fit model with normalized percentage values
model <- lm(normalized_percentage ~ class, data = normalized_data)
# Extract estimates and p-value from model summary
model_summary <- summary(model)
p_value <- anova(model)$`Pr(>F)`[1]
# Conduct pairwise comparisons using emmeans
pairwise_comp <- emmeans(model, pairwise ~ class, adjust = "tukey")
# Extract pairwise comparisons and p-values
pairwise_p_values <- summary(pairwise_comp)[["contrasts"]][["p.value"]]
contrast <- summary(pairwise_comp)[["contrasts"]][["contrast"]]
# Append cluster name, estimates, and p-value to results dataframe
results_df <- rbind(results_df, data.frame(cluster2 = cluster, p_value = p_value, contrast = contrast,
pairwise_p_values = pairwise_p_values))
# Create boxplot of normalized percentages by class
plot <- ggplot(normalized_data, aes(x = class, y = normalized_percentage)) + geom_boxplot(width = 0.5) +
geom_jitter(width = 0.3) + labs(title = paste("Cluster:", cluster)) + theme_bw() + theme(aspect.ratio = 1)
# Store plot in the list
plots_list[[cluster]] <- plot
}
Display the results dataframe:
# Display the results dataframe
print(results_df)
## cluster2 p_value contrast pairwise_p_values
## 1 1 0.1986608 CL1 - CL2 0.65206170
## 2 1 0.1986608 CL1 - CL3 0.58570646
## 3 1 0.1986608 CL2 - CL3 0.17343672
## 4 2 0.2429224 CL1 - CL2 0.22600442
## 5 2 0.2429224 CL1 - CL3 0.83603149
## 6 2 0.2429224 CL2 - CL3 0.49465662
## 7 3 0.2900285 CL1 - CL2 0.83026658
## 8 3 0.2900285 CL1 - CL3 0.26803502
## 9 3 0.2900285 CL2 - CL3 0.56701266
## 10 4 0.5987530 CL1 - CL2 0.92434864
## 11 4 0.5987530 CL1 - CL3 0.80043548
## 12 4 0.5987530 CL2 - CL3 0.57652596
## 13 5 0.2732207 CL1 - CL2 0.33922788
## 14 5 0.2732207 CL1 - CL3 0.34641925
## 15 5 0.2732207 CL2 - CL3 0.99741741
## 16 6 0.4322490 CL1 - CL2 0.50574638
## 17 6 0.4322490 CL1 - CL3 0.48947933
## 18 6 0.4322490 CL2 - CL3 0.99955656
## 19 7 0.0163081 CL1 - CL2 0.31132492
## 20 7 0.0163081 CL1 - CL3 0.01420838
## 21 7 0.0163081 CL2 - CL3 0.10806546
Plot the statistics in large-scale regions (environmental
clusters)
plots_list
## $`1`
##
## $`2`
##
## $`3`
##
## $`4`
##
## $`5`
##
## $`6`
##
## $`7`
This section analyzes the differences between large-scale regions for different features and provides statistical support in each case.
#### Differences based on environmental clusters
### plot differences of the values per cluster
master_class_sum <- as.data.frame(master_class_sum)
master_class_sum$cluster2 <- as.character(master_class_sum$cluster2)
df1$mean_movement <- df1$movement
master_class_sum2 <- left_join(master_class_sum, df1[, c("cluster2", "mean_movement")])
master_class_sum2$cluster2 = factor(master_class_sum2$cluster2, levels = df1$cluster2)
#### try an anova between clusters based on distance to speed
p_class_speed <- ggplot(master_class_sum2, aes(x = as.factor(class), y = speed, group = class, fill = class)) +
geom_violin(alpha = 1, outlier.colour = NA) + stat_summary(fun = median, geom = "crossbar",
size = 1, color = "grey25") + scale_fill_manual(values = c("cyan", "gold", "red")) + ggtitle("speed per env cluster") +
theme_classic() + theme(aspect.ratio = 1) + coord_cartesian(ylim = c(0, 20))
a_class_speed <- aov(speed ~ class, data = master_class_sum2)
#### try an anova between clusters based on squared displacement
p_class_disp2 <- ggplot(master_class_sum2, aes(x = as.factor(class), y = disp2, group = class, fill = class)) +
geom_violin(alpha = 1, outlier.colour = NA) + stat_summary(fun = median, geom = "crossbar",
size = 1, color = "grey25") + scale_fill_manual(values = c("cyan", "gold", "red")) + ggtitle("speed per env cluster") +
theme_classic() + theme(aspect.ratio = 1) + coord_cartesian(ylim = c(0, 180))
a_class_disp2 <- aov(disp2 ~ class, data = master_class_sum2)
#### try an anova between clusters based on raw tumor movement
p_class_movement <- ggplot(master_class_sum2, aes(x = as.factor(class), y = movement, group = class,
fill = class)) + geom_violin(alpha = 1, outlier.colour = NA) + stat_summary(fun = median, geom = "crossbar",
size = 1, color = "grey25") + scale_fill_manual(values = c("cyan", "gold", "red")) + ggtitle("movement direction per env cluster") +
theme_classic() + theme(aspect.ratio = 1) + scale_color_gradientn(colours = rev(RColorBrewer::brewer.pal(4,
"Spectral"))) + coord_cartesian(ylim = c(-10, 17))
a_class_movement <- aov(movement ~ class, data = master_class_sum2)
#### try an anova between clusters based on distance to dist_10_neigh
p_class_dist_10neigh <- ggplot(master_class_sum2, aes(x = as.factor(class), y = dist_10_neigh, fill = class)) +
geom_violin(alpha = 1, outlier.colour = NA) + stat_summary(fun = median, geom = "crossbar",
size = 1, color = "grey25") + scale_fill_manual(values = c("cyan", "gold", "red")) + ggtitle("dist 10 neigh per cluster") +
theme_classic() + theme(aspect.ratio = 1) + coord_cartesian(ylim = c(20, 70))
a_class_dist_10neigh <- aov(dist_10_neigh ~ class, data = master_class_sum2)
#### try an anova between clusters based on dist to MG
p_class_minMG <- ggplot(subset(master_class_sum2, !is.na(min_MG)), aes(x = as.factor(class), y = min_MG,
fill = class)) + geom_violin(alpha = 1, outlier.colour = NA) + stat_summary(fun = median, geom = "crossbar",
size = 1, color = "grey25") + scale_fill_manual(values = c("cyan", "gold", "red")) + ggtitle("distance to closest MG") +
theme_classic() + theme(aspect.ratio = 1) + coord_cartesian(ylim = c(0, 50))
a_class_minMG <- aov(min_MG ~ class, data = subset(master_class_sum2, !is.na(min_MG)))
#### try an anova between clusters based on dist to SR101
p_classmin_SR101 <- ggplot(subset(master_class_sum2, !is.na(min_SR101)), aes(x = as.factor(class),
y = min_SR101, fill = class)) + geom_violin(alpha = 1, outlier.colour = NA) + stat_summary(fun = median,
geom = "crossbar", size = 1, color = "grey25") + scale_fill_manual(values = c("cyan", "gold",
"red")) + ggtitle("distance to closest SR101") + theme_classic() + theme(aspect.ratio = 1) +
coord_cartesian(ylim = c(0, 50))
a_classmin_SR101 <- aov(min_SR101 ~ class, data = subset(master_class_sum2, !is.na(min_SR101)))
#### try an anova between clusters based on n of MG
p_class_n_MG <- ggplot(subset(master_class_sum2, !is.na(min_MG)), aes(x = as.factor(class), y = n_MG,
fill = class)) + geom_violin(alpha = 1, outlier.colour = NA) + stat_summary(fun = median, geom = "crossbar",
size = 1, color = "grey25") + scale_fill_manual(values = c("cyan", "gold", "red")) + ggtitle("n of MG per cluster") +
theme_classic() + theme(aspect.ratio = 1)
a_class_n_MG <- aov(n_MG ~ class, data = subset(master_class_sum2, !is.na(min_MG)))
#### try an anova between clusters based on n of SR101
p_classn_SR101 <- ggplot(subset(master_class_sum2, !is.na(min_SR101)), aes(x = as.factor(class),
y = n_SR101, fill = class)) + geom_violin(alpha = 1, outlier.colour = NA) + stat_summary(fun = median,
geom = "crossbar", size = 1, color = "grey25") + scale_fill_manual(values = c("cyan", "gold",
"red")) + ggtitle("n SR101 per cluster") + theme_classic() + theme(aspect.ratio = 1)
a_classn_SR101 <- aov(n_SR101 ~ class, data = subset(master_class_sum2, !is.na(min_SR101)))
#### try an anova between clusters based on BV distance min
p_class_BV_min <- ggplot(subset(master_class_sum2, !is.na(BV_min)), aes(x = as.factor(class), y = BV_min,
fill = class)) + geom_violin(alpha = 1, outlier.colour = NA) + stat_summary(fun = median, geom = "crossbar",
size = 1, color = "grey25") + scale_fill_manual(values = c("cyan", "gold", "red")) + ggtitle("min distance to BV") +
theme_classic() + theme(aspect.ratio = 1) + coord_cartesian(ylim = c(0, 18))
a_class_BV_min <- aov(BV_min ~ class, data = subset(master_class_sum2, !is.na(BV_min)))
#### try an anova between clusters based on BV distance mean
p_class_BV_mean <- ggplot(subset(master_class_sum2, !is.na(BV_mean)), aes(x = as.factor(class),
y = BV_mean, fill = class)) + geom_violin(alpha = 1, outlier.colour = NA) + stat_summary(fun = median,
geom = "crossbar", size = 1, color = "grey25") + scale_fill_manual(values = c("cyan", "gold",
"red")) + ggtitle("mean distance to BV") + theme_classic() + theme(aspect.ratio = 1) + coord_cartesian(ylim = c(0,
18))
a_class_BV_mean <- aov(BV_mean ~ class, data = subset(master_class_sum2, !is.na(BV_mean)))
## relation between position relative to the tumor and amount of MG:
pcor_MG_position <- ggplot(subset(master_class_sum2, !is.na(min_MG)), aes(x = distance_to_tumor,
y = n_MG)) + geom_jitter() + geom_point(alpha = 0.5, stat = "summary") + geom_smooth(method = "lm") +
ggtitle("scatter plot movement vs min_MG") + theme_classic() + theme(aspect.ratio = 1)
cor_18 <- cor.test(master_class_sum2$n_MG, master_class_sum2$distance_to_tumor, method = "pearson",
use = "pairwise.complete.obs")
Statistical information between large-scale regions (environmental clusters) based on ANOVA and Tukey’s HSD (Honestly Significant Difference) test:
summary(a_class_speed)
## Df Sum Sq Mean Sq F value Pr(>F)
## class 2 167 83.32 6.525 0.00153 **
## Residuals 978 12488 12.77
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_class_speed)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = speed ~ class, data = master_class_sum2)
##
## $class
## diff lwr upr p adj
## CL2-CL1 0.1001108 -0.5744387 0.7746603 0.9353020
## CL3-CL1 0.9064598 0.2405857 1.5723339 0.0041126
## CL3-CL2 0.8063490 0.1702051 1.4424928 0.0084220
summary(a_class_disp2)
## Df Sum Sq Mean Sq F value Pr(>F)
## class 2 2820 1410 0.064 0.938
## Residuals 978 21460215 21943
TukeyHSD(a_class_disp2)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = disp2 ~ class, data = master_class_sum2)
##
## $class
## diff lwr upr p adj
## CL2-CL1 4.144633 -23.81786 32.10713 0.9354622
## CL3-CL1 3.097067 -24.50580 30.69993 0.9624874
## CL3-CL2 -1.047566 -27.41801 25.32288 0.9952179
summary(a_class_movement)
## Df Sum Sq Mean Sq F value Pr(>F)
## class 2 23 11.67 0.656 0.519
## Residuals 978 17392 17.78
TukeyHSD(a_class_movement)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = movement ~ class, data = master_class_sum2)
##
## $class
## diff lwr upr p adj
## CL2-CL1 0.1138014 -0.6822461 0.9098489 0.9398244
## CL3-CL1 0.3680570 -0.4177525 1.1538665 0.5146473
## CL3-CL2 0.2542556 -0.4964687 1.0049800 0.7062059
summary(a_class_dist_10neigh)
## Df Sum Sq Mean Sq F value Pr(>F)
## class 2 1320 659.9 1.577 0.207
## Residuals 978 409213 418.4
TukeyHSD(a_class_dist_10neigh)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = dist_10_neigh ~ class, data = master_class_sum2)
##
## $class
## diff lwr upr p adj
## CL2-CL1 -2.5151366 -6.376437 1.346164 0.2777655
## CL3-CL1 -0.1387336 -3.950374 3.672907 0.9959840
## CL3-CL2 2.3764030 -1.265054 6.017860 0.2764354
summary(a_class_minMG)
## Df Sum Sq Mean Sq F value Pr(>F)
## class 2 2813 1406.4 10.02 5.26e-05 ***
## Residuals 590 82815 140.4
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_class_minMG)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = min_MG ~ class, data = subset(master_class_sum2, !is.na(min_MG)))
##
## $class
## diff lwr upr p adj
## CL2-CL1 -5.075902 -7.941646 -2.210159 0.0001074
## CL3-CL1 -3.773329 -6.454107 -1.092551 0.0028689
## CL3-CL2 1.302574 -1.640317 4.245465 0.5519149
summary(a_classmin_SR101)
## Df Sum Sq Mean Sq F value Pr(>F)
## class 2 3438 1719.0 11.13 1.99e-05 ***
## Residuals 385 59436 154.4
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_classmin_SR101)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = min_SR101 ~ class, data = subset(master_class_sum2, !is.na(min_SR101)))
##
## $class
## diff lwr upr p adj
## CL2-CL1 -9.024564 -13.534491 -4.514638 0.0000104
## CL3-CL1 -7.309367 -11.897598 -2.721135 0.0006007
## CL3-CL2 1.715198 -1.496459 4.926854 0.4207669
summary(a_class_n_MG)
## Df Sum Sq Mean Sq F value Pr(>F)
## class 2 644 322.0 35.55 2.64e-15 ***
## Residuals 590 5344 9.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_class_n_MG)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = n_MG ~ class, data = subset(master_class_sum2, !is.na(min_MG)))
##
## $class
## diff lwr upr p adj
## CL2-CL1 2.592120 1.8641533 3.3200858 0.0000000
## CL3-CL1 1.337224 0.6562435 2.0182050 0.0000145
## CL3-CL2 -1.254895 -2.0024589 -0.5073317 0.0002641
summary(a_classn_SR101)
## Df Sum Sq Mean Sq F value Pr(>F)
## class 2 422 210.98 15.05 5.11e-07 ***
## Residuals 385 5398 14.02
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_classn_SR101)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = n_SR101 ~ class, data = subset(master_class_sum2, !is.na(min_SR101)))
##
## $class
## diff lwr upr p adj
## CL2-CL1 2.6028601 1.2436788 3.962041 0.0000261
## CL3-CL1 3.2029326 1.8201521 4.585713 0.0000003
## CL3-CL2 0.6000725 -0.3678421 1.567987 0.3121099
summary(a_class_BV_min)
## Df Sum Sq Mean Sq F value Pr(>F)
## class 2 591 295.71 10.63 2.89e-05 ***
## Residuals 629 17502 27.82
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_class_BV_min)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = BV_min ~ class, data = subset(master_class_sum2, !is.na(BV_min)))
##
## $class
## diff lwr upr p adj
## CL2-CL1 -1.059503 -2.238974 0.1199684 0.0885449
## CL3-CL1 -2.422248 -3.657506 -1.1869908 0.0000148
## CL3-CL2 -1.362746 -2.580036 -0.1454545 0.0237365
summary(a_class_BV_mean)
## Df Sum Sq Mean Sq F value Pr(>F)
## class 2 463 231.36 6.735 0.00128 **
## Residuals 629 21609 34.35
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_class_BV_mean)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = BV_mean ~ class, data = subset(master_class_sum2, !is.na(BV_mean)))
##
## $class
## diff lwr upr p adj
## CL2-CL1 -1.3233714 -2.633946 -0.01279692 0.0471867
## CL3-CL1 -2.1065180 -3.479080 -0.73395634 0.0009792
## CL3-CL2 -0.7831466 -2.135745 0.56945151 0.3625740
Statistical information based on correlation test between position
relative to the tumor and amount of MG:
cor_18
##
## Pearson's product-moment correlation
##
## data: master_class_sum2$n_MG and master_class_sum2$distance_to_tumor
## t = -0.14404, df = 591, p-value = 0.8855
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.08639938 0.07462654
## sample estimates:
## cor
## -0.005924827
The output in this section shows a boxplot for each feature in each behavioral cluster according to large-scale regions:
p_class_speed
p_class_disp2
p_class_movement
p_class_dist_10neigh
p_class_minMG
p_classmin_SR101
p_class_n_MG
p_classn_SR101
p_class_BV_min
p_class_BV_mean
Dynamic Time Warping (DTW) is an algorithm used to measure similarity between two time series by aligning them in a way that minimizes the distance between corresponding points. It is particularly useful for comparing time series that may vary in speed or timing, like on this case.
If the matrix_distmat.rds file exists, it will skip the
DTW analysis time, as it has been run previously. Else, it will run
normally and generate the file at the end.
### MULTIVARIATE
list_master_pca <- split(master_pca3[, -c(1)], master_pca3$Track2) ## split into list
if (!file.exists("matrix_distmat.rds")) {
## parallel working load parallel
library(parallel)
# create multi-process workers
workers <- makeCluster(detectCores() - 2)
# load dtwclust in each one, and make them use 1 thread per worker
invisible(clusterEvalQ(workers, {
library(dtwclust)
RcppParallel::setThreadOptions(1L)
}))
# register your workers, e.g. with doParallel
require(doParallel)
registerDoParallel(workers)
### MULTIVARIATE
set.seed(123)
distmat <- proxy::dist(list_master_pca, method = "dtw")
saveRDS(distmat, file = "distmat.rds")
matrix_distmat <- as.matrix(distmat)
saveRDS(matrix_distmat, file = "matrix_distmat.rds")
} else {
matrix_distmat <- readRDS("matrix_distmat.rds")
}
Track2 <- as.numeric(names(list_master_pca))
Now, we can perform the UMAP representation and cluster in the desired number of clusters. Any of the parameters can be modified to your convenience.
However, in this module, the output of the UMAP is not relevant, as it does not reflect any small-scale phenotype.
# Adjust parameters according to the characteristics of the experiment
umap_dist <- umap(matrix_distmat, n_components = 2, input = "dist", init = "random", n_neighbors = 7,
min_dist = 0.5, spread = 6, n_epochs = 1000, local_connectivity = 1)
umap_1 <- as.data.frame(umap_dist$layout)
Track2_umap <- cbind(Track2, umap_1)
positiontype <- master_norm[, c("Track2", "position", "class", "mouse")]
positiontype <- positiontype[!duplicated(positiontype$Track2), ]
umap_2 <- left_join(Track2_umap, positiontype)
# Clustering
set.seed(123)
n_cluster = 7
hc <- hclust(dist(umap_dist$layout, method = "euclidean"), method = "ward.D2")
hierarchical_clusters <- cutree(hc, k = n_cluster)
umap_3 <- cbind(hierarchical_clusters, umap_2)
colnames(umap_3)[1] <- "cluster2"
### calculate average stats per track
master_class <- left_join(master_cor2, umap_3[c(1:4)], by = c(Track2 = "Track2"))
master_class <- master_class %>%
filter(cluster2 != 0)
master_class$cluster2 <- as.numeric(master_class$cluster2)
master_class_sum <- master_class %>%
group_by(position, class, mouse, Track2) %>%
arrange(Time) %>%
summarize(cluster2 = mean(cluster2), V1 = mean(V1), V2 = mean(V2), dist_tumor = mean(dist_tumor),
dist_3_neigh = mean(dist_3_neigh), dist_10_neigh = mean(dist_10_neigh), speed = mean(speed),
disp2 = mean(disp2), disp_d = mean(disp_d), disp_l = mean(disp_l), movement = last(dist_tumor),
distance_to_tumor = last(dist_tumor_1), raw_dist_tumor = last(dist_tumor), Time = first(Time)) %>%
ungroup()
## Join the information on the SR101 and MG
master_class_sum <- left_join(master_class_sum, master_distance_MG[c("Track2", "n_MG", "min_MG")],
by = c(Track2 = "Track2"))
master_class_sum <- left_join(master_class_sum, master_distance_SR101[c("Track2", "n_SR101", "min_SR101")],
by = c(Track2 = "Track2"))
master_class_sum <- left_join(master_class_sum, BV_df_sum, by = c(Track2 = "Track2"))
mid <- 0
master_class_sum$movement2 <- ifelse(master_class_sum$movement > 0, "away from tumor", ifelse(master_class_sum$movement ==
0, "no movement", "towards the tumor"))
master_class_sum <- master_class_sum[!is.na(master_class_sum$cluster2), ]
saveRDS(master_class_sum, "master_class_sum.rds")
write.csv(master_class_sum, "master_class_sum.csv")
In this section, we project the relevant small-scale features in a Heatmap
### Create heatmap
sum_all <- master_class_sum %>%
group_by(cluster2) %>%
summarise(displacement2 = median(disp2), speed = median(speed), displacement_delta = median(disp_d),
displacement_length = median(disp_l), max_speed = quantile(speed, 0.75), persistance = (displacement_length/displacement_delta),
nearest_10_cell = median(dist_10_neigh), dist_tumor_core = median(distance_to_tumor), n_SR101 = mean(n_SR101,
na.rm = T), min_SR101 = mean(min_SR101, na.rm = T), n_MG = mean(n_MG, na.rm = T), min_MG = mean(min_MG,
na.rm = T), min_BV_dist = mean(BV_min, na.rm = T), sd_BV_dist = mean(BV_sd, na.rm = T),
mean_BV_dist = mean(BV_mean, na.rm = T), movement = median(movement))
Cluster_movement <- sum_all[, c(1, length(names(sum_all)))]
Cluster_movement <- Cluster_movement[order(Cluster_movement$movement), ]
stdize = function(x, ...) {
(x - min(x, ...))/(max(x, ...) - min(x, ...))
}
sum_all[-c(1)] <- lapply(sum_all[-c(1)], stdize, na.rm = T)
df <- sum_all[, -c(1)]
df <- df[order(match(rownames(df), Cluster_movement$cluster2)), , drop = FALSE] %>%
ungroup()
df <- as.data.frame(df)
rownames(df) <- Cluster_movement$cluster2
Heatmap of all the included features:
heat_m <- pheatmap(df, clustering_method = "complete", cluster_cols = F, cluster_rows = F, cellwidth = 20,
cellheight = 20, treeheight_row = 0, treeheight_col = 0, fontsize = 8, na_col = "grey50", main = " All features",
angle_col = 90, color = colorRampPalette(c("deepskyblue1", "grey95", "deeppink2"))(50))
df_other_features <- df[, !colnames(df) %in% c("displacement2", "speed", "max_speed", "displacement_delta",
"displacement_length", "persistance", "movement")]
rownames(df_other_features) <- rownames(df)
Heatmap of the small-scale features:
pheatmap(df_other_features, clustering_method = "complete", cluster_cols = F, cluster_rows = F,
cellwidth = 20, cellheight = 20, treeheight_row = 0, treeheight_col = 0, fontsize = 8, na_col = "grey50",
main = "Small-scale TME features", angle_col = 90, color = colorRampPalette(c("deepskyblue1",
"grey95", "deeppink2"))(50))
Plots the behavioral clusters according to relevant small-scale
features
## plot clusters giving them a color according to direction Convert df to a data frame Add
## cluster2 as a column
df1 <- as.data.frame(df)
df1 <- mutate(df1, cluster2 = row.names(df))
# Order df1 by movement and reorder cluster2 accordingly
df1 <- df1 %>%
arrange(movement) %>%
mutate(cluster2 = factor(cluster2, levels = unique(cluster2)))
## set first order clusters based on the heatmap:
ggplot(master_class_sum, mapping = aes(x = V1, y = V2, color = n_SR101)) + geom_point(size = 2,
alpha = 0.6) + xlab("") + ylab("") + ggtitle("UMAP number of SR101 in 30um radius distribution") +
theme_light(base_size = 20) + theme(axis.text.x = element_blank(), axis.text.y = element_blank()) +
theme(aspect.ratio = 1) + scale_color_gradientn(colours = rev(RColorBrewer::brewer.pal(4, "Spectral")),
na.value = "NA")
ggplot(master_class_sum, mapping = aes(x = V1, y = V2, color = min_SR101)) + geom_point(size = 2,
alpha = 0.6) + xlab("") + ylab("") + ggtitle("UMAP min distance to SR101 in 30um radius distribution") +
theme_light(base_size = 20) + theme(axis.text.x = element_blank(), axis.text.y = element_blank()) +
theme(aspect.ratio = 1) + scale_color_gradientn(colours = RColorBrewer::brewer.pal(4, "Spectral"),
na.value = "NA", trans = "pseudo_log")
ggplot(master_class_sum, mapping = aes(x = V1, y = V2, color = min_MG)) + geom_point(size = 2, alpha = 0.6) +
xlab("") + ylab("") + ggtitle("UMAP min distance to MG in 30um radius distribution") + theme_light(base_size = 20) +
theme(axis.text.x = element_blank(), axis.text.y = element_blank()) + theme(aspect.ratio = 1) +
scale_color_gradientn(colours = RColorBrewer::brewer.pal(4, "Spectral"), na.value = "NA", trans = "pseudo_log")
ggplot(master_class_sum, mapping = aes(x = V1, y = V2, color = n_MG)) + geom_point(size = 2, alpha = 0.6) +
xlab("") + ylab("") + ggtitle("UMAP number of MG in 30um radius distribution") + theme_light(base_size = 20) +
theme(axis.text.x = element_blank(), axis.text.y = element_blank()) + theme(aspect.ratio = 1) +
scale_color_gradientn(colours = rev(RColorBrewer::brewer.pal(4, "Spectral")), na.value = "NA")
ggplot(master_class_sum, mapping = aes(x = V1, y = V2, color = BV_mean)) + geom_point(size = 2,
alpha = 0.6) + xlab("") + ylab("") + ggtitle("UMAP mean distance to Blood vessels") + theme_light(base_size = 20) +
theme(axis.text.x = element_blank(), axis.text.y = element_blank()) + theme(aspect.ratio = 1) +
scale_color_gradientn(colours = rev(RColorBrewer::brewer.pal(4, "Spectral")), na.value = "NA",
trans = "pseudo_log")
ggplot(master_class_sum, mapping = aes(x = V1, y = V2, color = BV_min)) + geom_point(size = 2, alpha = 0.6) +
xlab("") + ylab("") + ggtitle("UMAP mean contact with Blood vessels") + theme_light(base_size = 20) +
theme(axis.text.x = element_blank(), axis.text.y = element_blank()) + theme(aspect.ratio = 1) +
scale_color_gradientn(colours = rev(RColorBrewer::brewer.pal(4, "Spectral")), na.value = "NA")
ggplot(master_class_sum, mapping = aes(x = V1, y = V2, color = dist_10_neigh)) + geom_point(size = 2,
alpha = 0.6) + xlab("") + ylab("") + ggtitle("UMAP big neighbours distribution") + theme_light(base_size = 20) +
theme(axis.text.x = element_blank(), axis.text.y = element_blank()) + theme(aspect.ratio = 1) +
scale_color_gradientn(colours = rev(RColorBrewer::brewer.pal(4, "Spectral")), trans = "pseudo_log")
ggplot(master_class_sum, mapping = aes(x = V1, y = V2, color = dist_3_neigh)) + geom_point(size = 2,
alpha = 0.6) + xlab("") + ylab("") + ggtitle("UMAP close neighbours distribution") + theme_light(base_size = 20) +
theme(axis.text.x = element_blank(), axis.text.y = element_blank()) + theme(aspect.ratio = 1) +
scale_color_gradientn(colours = rev(RColorBrewer::brewer.pal(4, "Spectral")), trans = "pseudo_log")
This section analyzes the differences between clusters for small-scale features and provides statistical support in each case.
### plot differences of the values per cluster
master_class_sum <- as.data.frame(master_class_sum)
master_class_sum$cluster2 <- as.character(master_class_sum$cluster2)
df1$mean_movement <- df1$movement
master_class_sum2 <- left_join(master_class_sum, df1[, c("cluster2", "mean_movement")])
master_class_sum2$cluster2 = factor(master_class_sum2$cluster2, levels = df1$cluster2)
#### try an anova between clusters based on distance to dist_3_neigh
p_dist3_neigth <- ggplot(master_class_sum2, aes(x = as.factor(cluster2), y = dist_3_neigh, group = cluster2,
fill = cluster2)) + geom_jitter(width = 0.2, alpha = 0.5) + geom_boxplot(alpha = 0.5, outlier.colour = NA) +
ggtitle("dist 3 neigh per cluster") + theme_classic() + theme(aspect.ratio = 0.7) + scale_fill_manual(values = mypalette_1) +
coord_cartesian(ylim = c(10, 70))
a_dist_3_neigh <- aov(dist_3_neigh ~ cluster2, data = master_class_sum2)
#### try an anova between clusters based on distance to dist_10_neigh
p_dist_10_neigh <- ggplot(master_class_sum2, aes(x = as.factor(cluster2), y = dist_10_neigh, group = cluster2,
fill = cluster2)) + geom_jitter(width = 0.2, alpha = 0.5) + geom_boxplot(outlier.colour = NA,
alpha = 0.5) + ggtitle("dist 10 neigh per cluster") + theme_classic() + theme(aspect.ratio = 0.7) +
scale_fill_manual(values = mypalette_1) + coord_cartesian(ylim = c(20, 90))
a_dist_10_neigh <- aov(dist_10_neigh ~ cluster2, data = master_class_sum2)
#### try an anova between clusters based on dist to MG
p_min_MG <- ggplot(subset(master_class_sum2, !is.na(min_MG)), aes(x = as.factor(cluster2), y = min_MG,
group = cluster2, fill = cluster2)) + geom_jitter(width = 0.3, alpha = 0.5) + geom_boxplot(outlier.colour = NA,
alpha = 0.5) + ggtitle("distance to closest MG") + theme_classic() + theme(aspect.ratio = 0.7) +
scale_fill_manual(values = mypalette_1) + coord_cartesian(ylim = c(2, 45))
a_min_MG <- aov(min_MG ~ cluster2, data = subset(master_class_sum2, !is.na(min_MG)))
#### try an anova between clusters based on dist to SR101
p_min_SR101 <- ggplot(subset(master_class_sum2, !is.na(min_SR101)), aes(x = as.factor(cluster2),
y = min_SR101, group = cluster2, fill = cluster2)) + geom_jitter(width = 0.3, alpha = 0.5) +
geom_boxplot(outlier.colour = NA, alpha = 0.5) + ggtitle("distance to closest SR101") + theme_classic() +
theme(aspect.ratio = 0.7) + scale_fill_manual(values = mypalette_1) + coord_cartesian(ylim = c(0,
50))
a_min_SR101 <- aov(min_SR101 ~ cluster2, data = subset(master_class_sum2, !is.na(min_SR101)))
#### try an anova between clusters based on n of MG
p_n_MG <- ggplot(subset(master_class_sum2, !is.na(n_MG)), aes(x = as.factor(cluster2), y = n_MG,
group = cluster2, fill = cluster2)) + geom_jitter(width = 0.3, alpha = 0.5) + geom_boxplot(outlier.colour = NA,
alpha = 0.5) + ggtitle("n MG") + theme_classic() + theme(aspect.ratio = 0.7) + scale_fill_manual(values = mypalette_1) +
coord_cartesian(ylim = c(0, 15))
a_n_MG <- aov(n_MG ~ cluster2, data = subset(master_class_sum2, !is.na(min_MG)))
#### try an anova between clusters based on n of SR101
p_nSR101 <- ggplot(subset(master_class_sum2, !is.na(n_SR101)), aes(x = as.factor(cluster2), y = n_SR101,
group = cluster2, fill = cluster2)) + geom_jitter(width = 0.3, alpha = 0.5) + geom_boxplot(outlier.colour = NA,
alpha = 0.5) + ggtitle("n SR101") + theme_classic() + theme(aspect.ratio = 0.7) + scale_fill_manual(values = mypalette_1) +
coord_cartesian(ylim = c(0, 20))
a_nSR101 <- aov(n_SR101 ~ cluster2, data = subset(master_class_sum2, !is.na(n_SR101)))
#### try an anova between clusters based on mean distance to BV
p_BV_mean <- ggplot(subset(master_class_sum2, !is.na(BV_mean)), aes(x = as.factor(cluster2), y = BV_mean,
group = cluster2, fill = cluster2)) + geom_jitter(width = 0.3, alpha = 0.5) + geom_boxplot(outlier.colour = NA,
alpha = 0.5) + ggtitle("BV_mean") + theme_classic() + theme(aspect.ratio = 0.7) + scale_fill_manual(values = mypalette_1) +
coord_cartesian(ylim = c(0, 20))
a_BV_mean <- aov(BV_mean ~ cluster2, data = subset(master_class_sum2, !is.na(BV_mean)))
#### try an anova between clusters based on min distance to BV
p_BV_min <- ggplot(subset(master_class_sum2, !is.na(BV_min)), aes(x = as.factor(cluster2), y = BV_min,
group = cluster2, fill = cluster2)) + geom_jitter(width = 0.2, alpha = 0.5) + geom_boxplot(outlier.colour = NA,
alpha = 0.5) + ggtitle("BV_min") + theme_classic() + theme(aspect.ratio = 0.7) + scale_fill_manual(values = mypalette_1) +
coord_cartesian(ylim = c(0, 20))
a_BV_min <- aov(BV_min ~ cluster2, data = subset(master_class_sum2, !is.na(BV_min)))
#### try an anova between clusters based on contact to BV
p_BV_contact <- ggplot(subset(master_class_sum2, !is.na(BV_min)), aes(x = as.factor(cluster2), y = BV_contact,
group = cluster2, fill = cluster2)) + geom_jitter(width = 0.3, alpha = 0.5) + geom_boxplot(outlier.colour = NA,
alpha = 0.5) + ggtitle("BV_contact") + theme_classic() + scale_fill_manual(values = mypalette_1) +
theme(aspect.ratio = 0.7)
a_BV_contact <- aov(BV_contact ~ cluster2, data = subset(master_class_sum2, !is.na(BV_contact)))
#### try an anova between clusters based on sd distance to BV
p_BV_sd <- ggplot(subset(master_class_sum2, !is.na(BV_min)), aes(x = as.factor(cluster2), y = BV_sd,
group = cluster2, fill = cluster2)) + geom_jitter(width = 0.3, alpha = 0.5) + geom_boxplot(outlier.colour = NA,
alpha = 0.5) + ggtitle("Standart deviation distance") + theme_classic() + theme(aspect.ratio = 0.7) +
scale_fill_manual(values = mypalette_1) + coord_cartesian(ylim = c(0, 1.7))
a_BV_sd <- aov(BV_sd ~ cluster2, data = subset(master_class_sum2, !is.na(BV_min)))
Statistical information based on ANOVA and Tukey’s HSD (Honestly
Significant Difference) test:
summary(a_dist_3_neigh)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 3361 560.1 2.537 0.0192 *
## Residuals 974 215048 220.8
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_dist_3_neigh)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = dist_3_neigh ~ cluster2, data = master_class_sum2)
##
## $cluster2
## diff lwr upr p adj
## 7-5 -6.93325357 -13.4118742 -0.4546329 0.0269198
## 1-5 -2.52894319 -8.9130363 3.8551500 0.9051609
## 4-5 -1.21175380 -7.5533433 5.1298357 0.9977331
## 2-5 -1.57231465 -7.3540749 4.2094456 0.9846451
## 3-5 -2.18592905 -8.4433504 4.0714923 0.9465650
## 6-5 -1.54163514 -8.1300708 5.0468005 0.9930844
## 1-7 4.40431038 -1.1310478 9.9396686 0.2210967
## 4-7 5.72149977 0.2352167 11.2077829 0.0344673
## 2-7 5.36093892 0.5326580 10.1892199 0.0184533
## 3-7 4.74732452 -0.6414478 10.1360969 0.1261580
## 6-7 5.39161843 -0.3782194 11.1614563 0.0849160
## 4-1 1.31718939 -4.0571405 6.6915193 0.9911286
## 2-1 0.95662853 -3.7440540 5.6573111 0.9967798
## 3-1 0.34301413 -4.9317358 5.6177641 0.9999958
## 6-1 0.98730804 -4.6761846 6.6508006 0.9986478
## 2-4 -0.36056085 -5.0033540 4.2822323 0.9999879
## 3-4 -0.97417525 -6.1974021 4.2490516 0.9980209
## 6-4 -0.32988134 -5.9454188 5.2856561 0.9999977
## 3-2 -0.61361440 -5.1407651 3.9135363 0.9996806
## 6-2 0.03067951 -4.9439818 5.0053408 1.0000000
## 6-3 0.64429391 -4.8760164 6.1646042 0.9998663
summary(a_dist_10_neigh)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 7913 1318.9 3.191 0.00417 **
## Residuals 974 402620 413.4
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_dist_10_neigh)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = dist_10_neigh ~ cluster2, data = master_class_sum2)
##
## $cluster2
## diff lwr upr p adj
## 7-5 -9.6344697 -18.4991323 -0.7698071 0.0230608
## 1-5 -3.6079591 -12.3432802 5.1273619 0.8863472
## 4-5 -1.3846227 -10.0617862 7.2925408 0.9991826
## 2-5 -1.1854643 -9.0966165 6.7256879 0.9994300
## 3-5 -0.6943029 -9.2562997 7.8676938 0.9999844
## 6-5 -1.9677886 -10.9827104 7.0471333 0.9952617
## 1-7 6.0265105 -1.5474910 13.6005121 0.2210762
## 4-7 8.2498470 0.7429947 15.7566993 0.0205661
## 2-7 8.4490054 1.8424939 15.0555168 0.0031684
## 3-7 8.9401668 1.5667379 16.3135956 0.0065568
## 6-7 7.6666811 -0.2281577 15.5615199 0.0635258
## 4-1 2.2233364 -5.1303309 9.5770038 0.9735949
## 2-1 2.4224948 -4.0094244 8.8544140 0.9242927
## 3-1 2.9136562 -4.3037564 10.1310688 0.8969941
## 6-1 1.6401706 -6.1091566 9.3894977 0.9959999
## 2-4 0.1991584 -6.1535511 6.5518679 0.9999999
## 3-4 0.6903198 -6.4565940 7.8372336 0.9999560
## 6-4 -0.5831659 -8.2668762 7.1005444 0.9999894
## 3-2 0.4911614 -5.7033151 6.6856379 0.9999863
## 6-2 -0.7823242 -7.5891272 6.0244787 0.9998778
## 6-3 -1.2734856 -8.8268972 6.2799259 0.9988814
summary(a_min_MG)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 2139 356.4 2.502 0.0213 *
## Residuals 586 83489 142.5
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_min_MG)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = min_MG ~ cluster2, data = subset(master_class_sum2, !is.na(min_MG)))
##
## $cluster2
## diff lwr upr p adj
## 7-5 0.3505170 -6.340404 7.041438 0.9999988
## 1-5 -0.3832598 -6.420999 5.654479 0.9999963
## 4-5 -1.5160284 -8.181438 5.149381 0.9940069
## 2-5 -1.9014326 -7.574864 3.771999 0.9557741
## 3-5 -4.2509543 -10.496643 1.994735 0.4070059
## 6-5 2.2977584 -3.919563 8.515080 0.9299992
## 1-7 -0.7337768 -6.500626 5.033072 0.9997761
## 4-7 -1.8665454 -8.287600 4.554509 0.9781511
## 2-7 -2.2519496 -7.636193 3.132294 0.8792864
## 3-7 -4.6014713 -10.585691 1.382749 0.2579487
## 6-7 1.9472414 -4.007366 7.901849 0.9607289
## 4-1 -1.1327686 -6.869998 4.604461 0.9972483
## 2-1 -1.5181728 -6.065254 3.028908 0.9565741
## 3-1 -3.8676945 -9.111429 1.376040 0.3067384
## 6-1 2.6810182 -2.528896 7.890933 0.7312330
## 2-4 -0.3854042 -5.737912 4.967103 0.9999922
## 3-4 -2.7349259 -8.690608 3.220756 0.8234714
## 6-4 3.8137868 -2.112140 9.739713 0.4783595
## 3-2 -2.3495217 -7.169303 2.470259 0.7785444
## 6-2 4.1991910 -0.583773 8.982155 0.1285495
## 6-3 6.5487127 1.099167 11.998259 0.0074197
summary(a_min_SR101)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 2041 340.1 2.13 0.0492 *
## Residuals 381 60834 159.7
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_min_SR101)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = min_SR101 ~ cluster2, data = subset(master_class_sum2, !is.na(min_SR101)))
##
## $cluster2
## diff lwr upr p adj
## 7-5 -2.2138531 -11.5286889 7.100983 0.9922923
## 1-5 2.1984583 -8.2429279 12.639845 0.9960137
## 4-5 5.1974167 -3.8840565 14.278890 0.6188840
## 2-5 2.7409704 -6.0133709 11.495312 0.9678883
## 3-5 2.6085417 -6.6482462 11.865330 0.9811190
## 6-5 3.2834311 -7.5667008 14.133563 0.9728941
## 1-7 4.4123115 -3.8492200 12.673843 0.6932885
## 4-7 7.4112699 0.9534442 13.869096 0.0129863
## 2-7 4.9548235 -1.0342371 10.943884 0.1801036
## 3-7 4.8223949 -1.8797300 11.524520 0.3355079
## 6-7 5.4972842 -3.2751566 14.269725 0.5099975
## 4-1 2.9989584 -4.9985346 10.996451 0.9244121
## 2-1 0.5425120 -7.0814783 8.166502 0.9999926
## 3-1 0.4100834 -7.7859436 8.606110 0.9999991
## 6-1 1.0849728 -8.8755543 11.045500 0.9999082
## 2-4 -2.4564464 -8.0756846 3.162792 0.8536983
## 3-4 -2.5888750 -8.9626862 3.784936 0.8924347
## 6-4 -1.9139856 -10.4382281 6.610257 0.9943324
## 3-2 -0.1324286 -6.0308016 5.765944 1.0000000
## 6-2 0.5424607 -7.6323814 8.717303 0.9999951
## 6-3 0.6748894 -8.0358899 9.385669 0.9999877
summary(a_n_MG)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 174 28.966 2.919 0.00818 **
## Residuals 586 5814 9.922
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_n_MG)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = n_MG ~ cluster2, data = subset(master_class_sum2, !is.na(min_MG)))
##
## $cluster2
## diff lwr upr p adj
## 7-5 0.4961538 -1.26952538 2.2618331 0.9816320
## 1-5 0.5061538 -1.08715591 2.0994636 0.9659206
## 4-5 0.6576293 -1.10131763 2.4165761 0.9262042
## 2-5 0.9185223 -0.57864979 2.4156943 0.5382961
## 3-5 1.9003707 0.25218456 3.5485569 0.0122082
## 6-5 0.2638009 -1.37689934 1.9045011 0.9991366
## 1-7 0.0100000 -1.51182411 1.5318241 1.0000000
## 4-7 0.1614754 -1.53298816 1.8559390 0.9999589
## 2-7 0.4223684 -0.99848934 1.8432262 0.9755294
## 3-7 1.4042169 -0.17496970 2.9834034 0.1186677
## 6-7 -0.2323529 -1.80372493 1.3390190 0.9994659
## 4-1 0.1514754 -1.36253240 1.6654832 0.9999452
## 2-1 0.4123684 -0.78756893 1.6123058 0.9501389
## 3-1 1.3942169 0.01043832 2.7779954 0.0469106
## 6-1 -0.2423529 -1.61720667 1.1325008 0.9985438
## 2-4 0.2608930 -1.15158984 1.6733759 0.9981051
## 3-4 1.2427415 -0.32891411 2.8143970 0.2270433
## 6-4 -0.3938284 -1.95763171 1.1699750 0.9896421
## 3-2 0.9818484 -0.29005219 2.2537491 0.2535156
## 6-2 -0.6547214 -1.91690635 0.6074636 0.7237356
## 6-3 -1.6365698 -3.07466034 -0.1984793 0.0141648
summary(a_nSR101)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 296 49.31 3.4 0.00279 **
## Residuals 381 5525 14.50
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_nSR101)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = n_SR101 ~ cluster2, data = subset(master_class_sum2, !is.na(n_SR101)))
##
## $cluster2
## diff lwr upr p adj
## 7-5 0.49254844 -2.314512 3.2996091 0.9985594
## 1-5 -0.92521994 -4.071771 2.2213309 0.9765467
## 4-5 -2.10242424 -4.839160 0.6343118 0.2578385
## 2-5 -1.51918265 -4.157336 1.1189709 0.6118362
## 3-5 -1.42471591 -4.214284 1.3648518 0.7364154
## 6-5 -1.87062937 -5.140357 1.3990985 0.6192901
## 1-7 -1.41776838 -3.907412 1.0718751 0.6245299
## 4-7 -2.59497268 -4.541063 -0.6488827 0.0017719
## 2-7 -2.01173109 -3.816557 -0.2069052 0.0178879
## 3-7 -1.91726434 -3.936975 0.1024461 0.0754140
## 6-7 -2.36317781 -5.006786 0.2804301 0.1143430
## 4-1 -1.17720430 -3.587279 1.2328702 0.7752505
## 2-1 -0.59396271 -2.891481 1.7035553 0.9879413
## 3-1 -0.49949597 -2.969399 1.9704075 0.9968096
## 6-1 -0.94540943 -3.947052 2.0562327 0.9669323
## 2-4 0.58324159 -1.110137 2.2766201 0.9490331
## 3-4 0.67770833 -1.243064 2.5984802 0.9429015
## 6-4 0.23179487 -2.337018 2.8006073 0.9999697
## 3-2 0.09446674 -1.683030 1.8719635 0.9999987
## 6-2 -0.35144672 -2.814966 2.1120726 0.9995587
## 6-3 -0.44591346 -3.070939 2.1791125 0.9988011
summary(a_BV_mean)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 201 33.58 0.96 0.452
## Residuals 625 21870 34.99
TukeyHSD(a_BV_mean)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = BV_mean ~ cluster2, data = subset(master_class_sum2, !is.na(BV_mean)))
##
## $cluster2
## diff lwr upr p adj
## 7-5 0.114380955 -2.8875803 3.116342 0.9999998
## 1-5 1.343411715 -1.9266146 4.613438 0.8882539
## 4-5 1.531024067 -1.4533920 4.515440 0.7343193
## 2-5 0.883749247 -1.9221234 3.689622 0.9673766
## 3-5 0.050925195 -3.1611584 3.263009 1.0000000
## 6-5 0.891688746 -2.5906524 4.374030 0.9887070
## 1-7 1.229030760 -1.5019173 3.959979 0.8371332
## 4-7 1.416643112 -0.9648993 3.798186 0.5760417
## 2-7 0.769368292 -1.3842162 2.922953 0.9402255
## 3-7 -0.063455760 -2.7247499 2.597838 1.0000000
## 6-7 0.777307791 -2.2045884 3.759204 0.9875916
## 4-1 0.187612352 -2.5240375 2.899262 0.9999938
## 2-1 -0.459662468 -2.9734697 2.054145 0.9982125
## 3-1 -1.292486520 -4.2528559 1.667883 0.8558759
## 6-1 -0.451722969 -3.7033388 2.799893 0.9996280
## 2-4 -0.647274820 -2.7763342 1.481785 0.9726408
## 3-4 -1.480098872 -4.1215860 1.161388 0.6447799
## 6-4 -0.639335321 -3.6035676 2.324897 0.9955226
## 3-2 -0.832824053 -3.2707814 1.605133 0.9516040
## 6-2 0.007939499 -2.7764554 2.792334 1.0000000
## 6-3 0.840763552 -2.3525756 4.034103 0.9869246
summary(a_BV_min)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 400 66.73 2.357 0.0293 *
## Residuals 625 17693 28.31
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_BV_min)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = BV_min ~ cluster2, data = subset(master_class_sum2, !is.na(BV_min)))
##
## $cluster2
## diff lwr upr p adj
## 7-5 1.29094761 -1.40914409 3.9910393 0.7941441
## 1-5 1.74500151 -1.19619930 4.6862023 0.5791495
## 4-5 2.72606132 0.04175054 5.4103721 0.0438061
## 2-5 2.21955044 -0.30417080 4.7432717 0.1272230
## 3-5 1.27030482 -1.61877987 4.1593895 0.8516710
## 6-5 0.60080414 -2.53136168 3.7329700 0.9976621
## 1-7 0.45405390 -2.00227697 2.9103848 0.9981000
## 4-7 1.43511371 -0.70694687 3.5771743 0.4273434
## 2-7 0.92860283 -1.00842274 2.8656284 0.7920614
## 3-7 -0.02064279 -2.41432395 2.3730384 1.0000000
## 6-7 -0.69014347 -3.37218782 1.9919009 0.9884104
## 4-1 0.98105981 -1.45791345 3.4200331 0.8979741
## 2-1 0.47454892 -1.78647630 2.7355741 0.9961461
## 3-1 -0.47469669 -3.13737889 2.1879855 0.9984505
## 6-1 -1.14419737 -4.06883904 1.7804443 0.9096646
## 2-4 -0.50651088 -2.42147752 1.4084558 0.9866053
## 3-4 -1.45575650 -3.83162238 0.9201094 0.5400894
## 6-4 -2.12525718 -4.79141380 0.5408994 0.2185533
## 3-2 -0.94924561 -3.14204820 1.2435570 0.8608218
## 6-2 -1.61874630 -4.12314953 0.8856569 0.4731006
## 6-3 -0.66950068 -3.54172579 2.2027244 0.9931665
summary(a_BV_contact)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 1.46 0.2426 1.505 0.174
## Residuals 625 100.76 0.1612
TukeyHSD(a_BV_contact)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = BV_contact ~ cluster2, data = subset(master_class_sum2, !is.na(BV_contact)))
##
## $cluster2
## diff lwr upr p adj
## 7-5 0.006800382 -0.19695851 0.21055928 0.9999999
## 1-5 0.059229765 -0.16272411 0.28118364 0.9859693
## 4-5 0.091152403 -0.11141560 0.29372041 0.8372108
## 2-5 0.031521050 -0.15892828 0.22197038 0.9989834
## 3-5 -0.062548543 -0.28056955 0.15547246 0.9795994
## 6-5 0.102116602 -0.13424820 0.33848141 0.8619585
## 1-7 0.052429383 -0.13293442 0.23779318 0.9810314
## 4-7 0.084352021 -0.07729578 0.24599983 0.7182519
## 2-7 0.024720668 -0.12145444 0.17089577 0.9988507
## 3-7 -0.069348925 -0.24998495 0.11128710 0.9169368
## 6-7 0.095316220 -0.10708075 0.29771319 0.8056035
## 4-1 0.031922638 -0.15213129 0.21597657 0.9986737
## 2-1 -0.027708715 -0.19833403 0.14291660 0.9990877
## 3-1 -0.121778308 -0.32271414 0.07915753 0.5534977
## 6-1 0.042886837 -0.17781743 0.26359110 0.9974866
## 2-4 -0.059631353 -0.20414181 0.08487910 0.8861264
## 3-4 -0.153700946 -0.33299256 0.02559067 0.1483902
## 6-4 0.010964199 -0.19023383 0.21216223 0.9999985
## 3-2 -0.094069594 -0.25954658 0.07140739 0.6287125
## 6-2 0.070595552 -0.11839597 0.25958707 0.9265977
## 6-3 0.164665146 -0.05208357 0.38141386 0.2719348
summary(a_BV_sd)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 7.92 1.320 12.23 4.92e-13 ***
## Residuals 625 67.47 0.108
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_BV_sd)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = BV_sd ~ cluster2, data = subset(master_class_sum2, !is.na(BV_min)))
##
## $cluster2
## diff lwr upr p adj
## 7-5 -0.243957825 -0.41069736 -0.07721829 0.0003506
## 1-5 -0.167763213 -0.34939202 0.01386560 0.0920777
## 4-5 -0.263551544 -0.42931655 -0.09778653 0.0000645
## 2-5 -0.288589329 -0.44443741 -0.13274125 0.0000013
## 3-5 -0.164469714 -0.34288018 0.01394076 0.0933383
## 6-5 0.070851456 -0.12257007 0.26427298 0.9328777
## 1-7 0.076194612 -0.07549189 0.22788111 0.7533175
## 4-7 -0.019593719 -0.15187300 0.11268556 0.9994615
## 2-7 -0.044631505 -0.16424920 0.07498619 0.9269728
## 3-7 0.079488111 -0.06832956 0.22730579 0.6884603
## 6-7 0.314809280 0.14918423 0.48043433 0.0000006
## 4-1 -0.095788331 -0.24640294 0.05482628 0.4936033
## 2-1 -0.120826117 -0.26045185 0.01879962 0.1403949
## 3-1 0.003293499 -0.16113587 0.16772287 1.0000000
## 6-1 0.238614668 0.05800844 0.41922090 0.0019826
## 2-4 -0.025037785 -0.14329327 0.09321770 0.9959558
## 3-4 0.099081830 -0.04763569 0.24579935 0.4171773
## 6-4 0.334402999 0.16975907 0.49904693 0.0000001
## 3-2 0.124119616 -0.01129315 0.25953238 0.0971041
## 6-2 0.359440785 0.20478566 0.51409591 0.0000000
## 6-3 0.235321169 0.05795183 0.41269050 0.0018590
The output in this section shows a boxplot for each small-scale feature
in each behavioral cluster
p_dist3_neigth
p_dist_10_neigh
p_min_MG
p_min_SR101
p_n_MG
p_nSR101
p_BV_mean
p_BV_min
p_BV_contact
p_BV_sd
This section analyzes the differences between large-scale regions for small-scale features and provides statistical support in each case.
#### Differences based on environmental clusters
### plot differences of the values per cluster
master_class_sum <- as.data.frame(master_class_sum)
master_class_sum$cluster2 <- as.character(master_class_sum$cluster2)
df1$mean_movement <- df1$movement
master_class_sum2 <- left_join(master_class_sum, df1[, c("cluster2", "mean_movement")])
master_class_sum2$cluster2 = factor(master_class_sum2$cluster2, levels = df1$cluster2)
#### try an anova between clusters based on distance to dist_10_neigh
p_class_dist_10neigh <- ggplot(master_class_sum2, aes(x = as.factor(class), y = dist_10_neigh, fill = class)) +
geom_violin(alpha = 1, outlier.colour = NA) + stat_summary(fun = median, geom = "crossbar",
size = 1, color = "grey25") + scale_fill_manual(values = c("cyan", "gold", "red")) + ggtitle("dist 10 neigh per cluster") +
theme_classic() + theme(aspect.ratio = 1) + coord_cartesian(ylim = c(20, 70))
a_class_dist_10neigh <- aov(dist_10_neigh ~ class, data = master_class_sum2)
#### try an anova between clusters based on dist to MG
p_class_minMG <- ggplot(subset(master_class_sum2, !is.na(min_MG)), aes(x = as.factor(class), y = min_MG,
fill = class)) + geom_violin(alpha = 1, outlier.colour = NA) + stat_summary(fun = median, geom = "crossbar",
size = 1, color = "grey25") + scale_fill_manual(values = c("cyan", "gold", "red")) + ggtitle("distance to closest MG") +
theme_classic() + theme(aspect.ratio = 1) + coord_cartesian(ylim = c(0, 50))
a_class_minMG <- aov(min_MG ~ class, data = subset(master_class_sum2, !is.na(min_MG)))
#### try an anova between clusters based on dist to SR101
p_classmin_SR101 <- ggplot(subset(master_class_sum2, !is.na(min_SR101)), aes(x = as.factor(class),
y = min_SR101, fill = class)) + geom_violin(alpha = 1, outlier.colour = NA) + stat_summary(fun = median,
geom = "crossbar", size = 1, color = "grey25") + scale_fill_manual(values = c("cyan", "gold",
"red")) + ggtitle("distance to closest SR101") + theme_classic() + theme(aspect.ratio = 1) +
coord_cartesian(ylim = c(0, 50))
a_classmin_SR101 <- aov(min_SR101 ~ class, data = subset(master_class_sum2, !is.na(min_SR101)))
#### try an anova between clusters based on n of MG
p_class_n_MG <- ggplot(subset(master_class_sum2, !is.na(min_MG)), aes(x = as.factor(class), y = n_MG,
fill = class)) + geom_violin(alpha = 1, outlier.colour = NA) + stat_summary(fun = median, geom = "crossbar",
size = 1, color = "grey25") + scale_fill_manual(values = c("cyan", "gold", "red")) + ggtitle("n of MG per cluster") +
theme_classic() + theme(aspect.ratio = 1)
a_class_n_MG <- aov(n_MG ~ class, data = subset(master_class_sum2, !is.na(min_MG)))
#### try an anova between clusters based on n of SR101
p_classn_SR101 <- ggplot(subset(master_class_sum2, !is.na(min_SR101)), aes(x = as.factor(class),
y = n_SR101, fill = class)) + geom_violin(alpha = 1, outlier.colour = NA) + stat_summary(fun = median,
geom = "crossbar", size = 1, color = "grey25") + scale_fill_manual(values = c("cyan", "gold",
"red")) + ggtitle("n SR101 per cluster") + theme_classic() + theme(aspect.ratio = 1)
a_classn_SR101 <- aov(n_SR101 ~ class, data = subset(master_class_sum2, !is.na(min_SR101)))
#### try an anova between clusters based on BV distance min
p_class_BV_min <- ggplot(subset(master_class_sum2, !is.na(BV_min)), aes(x = as.factor(class), y = BV_min,
fill = class)) + geom_violin(alpha = 1, outlier.colour = NA) + stat_summary(fun = median, geom = "crossbar",
size = 1, color = "grey25") + scale_fill_manual(values = c("cyan", "gold", "red")) + ggtitle("min distance to BV") +
theme_classic() + theme(aspect.ratio = 1) + coord_cartesian(ylim = c(0, 18))
a_class_BV_min <- aov(BV_min ~ class, data = subset(master_class_sum2, !is.na(BV_min)))
#### try an anova between clusters based on BV distance mean
p_class_BV_mean <- ggplot(subset(master_class_sum2, !is.na(BV_mean)), aes(x = as.factor(class),
y = BV_mean, fill = class)) + geom_violin(alpha = 1, outlier.colour = NA) + stat_summary(fun = median,
geom = "crossbar", size = 1, color = "grey25") + scale_fill_manual(values = c("cyan", "gold",
"red")) + ggtitle("mean distance to BV") + theme_classic() + theme(aspect.ratio = 1) + coord_cartesian(ylim = c(0,
18))
a_class_BV_mean <- aov(BV_mean ~ class, data = subset(master_class_sum2, !is.na(BV_mean)))
## relation between position relative to the tumor and amount of MG:
pcor_MG_position <- ggplot(subset(master_class_sum2, !is.na(min_MG)), aes(x = distance_to_tumor,
y = n_MG)) + geom_jitter() + geom_point(alpha = 0.5, stat = "summary") + geom_smooth(method = "lm") +
ggtitle("scatter plot movement vs min_MG") + theme_classic() + theme(aspect.ratio = 1)
cor_18 <- cor.test(master_class_sum2$n_MG, master_class_sum2$distance_to_tumor, method = "pearson",
use = "pairwise.complete.obs")
Statistical information between large-scale regions (environmental
clusters) based on ANOVA and Tukey’s HSD (Honestly Significant
Difference) test:
summary(a_class_dist_10neigh)
## Df Sum Sq Mean Sq F value Pr(>F)
## class 2 1320 659.9 1.577 0.207
## Residuals 978 409213 418.4
TukeyHSD(a_class_dist_10neigh)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = dist_10_neigh ~ class, data = master_class_sum2)
##
## $class
## diff lwr upr p adj
## CL2-CL1 -2.5151366 -6.376437 1.346164 0.2777655
## CL3-CL1 -0.1387336 -3.950374 3.672907 0.9959840
## CL3-CL2 2.3764030 -1.265054 6.017860 0.2764354
summary(a_class_minMG)
## Df Sum Sq Mean Sq F value Pr(>F)
## class 2 2813 1406.4 10.02 5.26e-05 ***
## Residuals 590 82815 140.4
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_class_minMG)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = min_MG ~ class, data = subset(master_class_sum2, !is.na(min_MG)))
##
## $class
## diff lwr upr p adj
## CL2-CL1 -5.075902 -7.941646 -2.210159 0.0001074
## CL3-CL1 -3.773329 -6.454107 -1.092551 0.0028689
## CL3-CL2 1.302574 -1.640317 4.245465 0.5519149
summary(a_classmin_SR101)
## Df Sum Sq Mean Sq F value Pr(>F)
## class 2 3438 1719.0 11.13 1.99e-05 ***
## Residuals 385 59436 154.4
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_classmin_SR101)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = min_SR101 ~ class, data = subset(master_class_sum2, !is.na(min_SR101)))
##
## $class
## diff lwr upr p adj
## CL2-CL1 -9.024564 -13.534491 -4.514638 0.0000104
## CL3-CL1 -7.309367 -11.897598 -2.721135 0.0006007
## CL3-CL2 1.715198 -1.496459 4.926854 0.4207669
summary(a_class_n_MG)
## Df Sum Sq Mean Sq F value Pr(>F)
## class 2 644 322.0 35.55 2.64e-15 ***
## Residuals 590 5344 9.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_class_n_MG)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = n_MG ~ class, data = subset(master_class_sum2, !is.na(min_MG)))
##
## $class
## diff lwr upr p adj
## CL2-CL1 2.592120 1.8641533 3.3200858 0.0000000
## CL3-CL1 1.337224 0.6562435 2.0182050 0.0000145
## CL3-CL2 -1.254895 -2.0024589 -0.5073317 0.0002641
summary(a_classn_SR101)
## Df Sum Sq Mean Sq F value Pr(>F)
## class 2 422 210.98 15.05 5.11e-07 ***
## Residuals 385 5398 14.02
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_classn_SR101)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = n_SR101 ~ class, data = subset(master_class_sum2, !is.na(min_SR101)))
##
## $class
## diff lwr upr p adj
## CL2-CL1 2.6028601 1.2436788 3.962041 0.0000261
## CL3-CL1 3.2029326 1.8201521 4.585713 0.0000003
## CL3-CL2 0.6000725 -0.3678421 1.567987 0.3121099
summary(a_class_BV_min)
## Df Sum Sq Mean Sq F value Pr(>F)
## class 2 591 295.71 10.63 2.89e-05 ***
## Residuals 629 17502 27.82
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_class_BV_min)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = BV_min ~ class, data = subset(master_class_sum2, !is.na(BV_min)))
##
## $class
## diff lwr upr p adj
## CL2-CL1 -1.059503 -2.238974 0.1199684 0.0885449
## CL3-CL1 -2.422248 -3.657506 -1.1869908 0.0000148
## CL3-CL2 -1.362746 -2.580036 -0.1454545 0.0237365
summary(a_class_BV_mean)
## Df Sum Sq Mean Sq F value Pr(>F)
## class 2 463 231.36 6.735 0.00128 **
## Residuals 629 21609 34.35
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_class_BV_mean)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = BV_mean ~ class, data = subset(master_class_sum2, !is.na(BV_mean)))
##
## $class
## diff lwr upr p adj
## CL2-CL1 -1.3233714 -2.633946 -0.01279692 0.0471867
## CL3-CL1 -2.1065180 -3.479080 -0.73395634 0.0009792
## CL3-CL2 -0.7831466 -2.135745 0.56945151 0.3625740
Statistical information based on correlation test between position
relative to the tumor and amount of MG:
cor_18
##
## Pearson's product-moment correlation
##
## data: master_class_sum2$n_MG and master_class_sum2$distance_to_tumor
## t = -0.14404, df = 591, p-value = 0.8855
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.08639938 0.07462654
## sample estimates:
## cor
## -0.005924827
The output in this section shows a boxplot for each feature in each behavioral cluster according to large-scale regions:
p_class_dist_10neigh
p_class_minMG
p_classmin_SR101
p_class_n_MG
p_classn_SR101
p_class_BV_min
p_class_BV_mean
## R version 4.3.3 (2024-02-29 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19045)
##
## Matrix products: default
##
##
## Random number generation:
## RNG: L'Ecuyer-CMRG
## Normal: Inversion
## Sample: Rejection
##
## locale:
## [1] LC_COLLATE=Spanish_Spain.utf8 LC_CTYPE=Spanish_Spain.utf8
## [3] LC_MONETARY=Spanish_Spain.utf8 LC_NUMERIC=C
## [5] LC_TIME=Spanish_Spain.utf8
##
## time zone: Europe/Madrid
## tzcode source: internal
##
## attached base packages:
## [1] parallel stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] zoo_1.8-12 viridis_0.6.5 viridisLite_0.4.2
## [4] umap_0.2.10.0 tidyr_1.3.1 spatstat_3.0-8
## [7] spatstat.linnet_3.1-5 spatstat.model_3.2-11 rpart_4.1.23
## [10] spatstat.explore_3.2-7 nlme_3.1-164 spatstat.random_3.2-3
## [13] spatstat.geom_3.2-9 spatstat.data_3.0-4 sp_2.1-4
## [16] scales_1.3.0 reshape2_1.4.4 readr_2.1.5
## [19] RANN_2.6.1 plotly_4.10.4 pheatmap_1.0.12
## [22] gtools_3.9.5 ggplot2_3.5.1 emmeans_1.10.2
## [25] e1071_1.7-14 dtwclust_5.5.12 dtw_1.23-1
## [28] proxy_0.4-27 dplyr_1.1.4 DT_0.33
##
## loaded via a namespace (and not attached):
## [1] deldir_2.0-4 gridExtra_2.3 formatR_1.14
## [4] rlang_1.1.3 magrittr_2.0.3 clue_0.3-65
## [7] flexclust_1.4-2 compiler_4.3.3 mgcv_1.9-1
## [10] png_0.1-8 vctrs_0.6.5 stringr_1.5.1
## [13] crayon_1.5.2 pkgconfig_2.0.3 fastmap_1.2.0
## [16] labeling_0.4.3 utf8_1.2.4 promises_1.3.0
## [19] rmarkdown_2.27 tzdb_0.4.0 bit_4.0.5
## [22] purrr_1.0.2 xfun_0.44 modeltools_0.2-23
## [25] cachem_1.1.0 jsonlite_1.8.8 goftest_1.2-3
## [28] highr_0.11 later_1.3.2 spatstat.utils_3.0-4
## [31] cluster_2.1.6 R6_2.5.1 bslib_0.7.0
## [34] stringi_1.8.4 RColorBrewer_1.1-3 reticulate_1.37.0
## [37] jquerylib_0.1.4 estimability_1.5.1 Rcpp_1.0.12
## [40] iterators_1.0.14 knitr_1.47 tensor_1.5
## [43] splines_4.3.3 httpuv_1.6.15 Matrix_1.6-5
## [46] tidyselect_1.2.1 rstudioapi_0.16.0 abind_1.4-5
## [49] yaml_2.3.8 codetools_0.2-20 lattice_0.22-6
## [52] tibble_3.2.1 plyr_1.8.9 shiny_1.8.1.1
## [55] withr_3.0.0 askpass_1.2.0 evaluate_0.23
## [58] RcppParallel_5.1.7 polyclip_1.10-6 pillar_1.9.0
## [61] foreach_1.5.2 stats4_4.3.3 shinyjs_2.1.0
## [64] generics_0.1.3 vroom_1.6.5 hms_1.1.3
## [67] munsell_0.5.1 xtable_1.8-4 class_7.3-22
## [70] glue_1.7.0 lazyeval_0.2.2 tools_4.3.3
## [73] data.table_1.15.4 RSpectra_0.16-1 mvtnorm_1.2-5
## [76] grid_4.3.3 crosstalk_1.2.1 colorspace_2.1-0
## [79] cli_3.6.2 spatstat.sparse_3.0-3 fansi_1.0.6
## [82] gtable_0.3.5 sass_0.4.9 digest_0.6.35
## [85] ggrepel_0.9.5 farver_2.1.2 htmlwidgets_1.6.4
## [88] htmltools_0.5.8 lifecycle_1.0.4 httr_1.4.7
## [91] mime_0.12 bit64_4.0.5 openssl_2.2.0
Thank you for reading the BEHAV3D Tumor Profiler - Guided Tutorial.